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Abstract. Szemeredi's regularity lemma can be viewed as a rough 
structure theorem for arbitrary dense graphs, decomposing such graphs 
into a structured piece (a partition into cells with edge densities), a small 
error (corresponding to irregular cells), and a uniform piece (the pseu- 
dorandom deviations from the edge densities). We establish an arith- 
metic regularity lemma that similarly decomposes bounded functions 
/ : [TV] — > C, into a (well-equidistributed, virtual) s-step nilsequence, 
an error which is small in L 2 and a further error which is minuscule 
in the Gowers t/ s+1 -norm, where s ^ 1 is a parameter. We then estab- 
lish a complementary arithmetic counting lemma that counts arithmetic 
patterns in the nilsequence component of /. 

We provide a number of applications of these lemmas: a proof of 
Szemeredi's theorem on arithmetic progressions, a proof of a conjecture 
of Bergelson, Host and Kra, and a generalisation of certain results of 
Gowers and Wolf. 

Our result is dependent on the inverse conjecture for the Gowers U s+1 
norm, recently established for general s by the authors and T. Ziegler. 



To Endre Szemeredi on the occasion of his 70th birthday. 
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1. Introduction 

Szemeredi's celebrated regularity lemma \46\ WI\ is a fundamental tool in 
graph theory; see for instance [34J for a survey of some of its many applica- 
tions. It is often described as a structure theorem for graphs G = (V, E), but 
one may also view it as a decomposition for arbitrary functions / : V x V — > 
[0,1]. For instance, one can recast the regularity lemma in the following 
"analytic" form. Define a growth function to be any monotone increasing 
function T : R + -> M+ with F{M) ^ M for all M. 

Lemma 1.1 (Szemeredi regularity lemma, analytic form). Let V be a finite 
vertex set, let f : V x V — > [0, 1] be a function, let e > 0, and let T : M + — > 
M + be a growth function. Then there exists an positive intege^ M = £ ^(1) 
and a decomposition 

f = /str + /sml + Ainf (1-1) 

of f into functions f stT , f Bmh / unf : V x V -> [-1, 1] such that: 

(/str structured) V can be partitioned into M cells Vi, . . . , Vm, such that 
fstr is constant onViXVj for all i,j with 1 ^ i,j ^ M; 

(/sml sm«ii) r/ie quantity ||/sml||L 2 (yxV) := (^v,wev\fsmi(v, w)\ 2 ) 1/2 is at 
most e. 

(/unf very uniform) The box norm \\fxmi\\n 2 (VxV)> defined to be the quan- 
tity 

(E Ul!V2jtoljto2e y/ U nf (vi, Wi)f uni (l>i, Ui 2 )/unf (v 2 , 10 1) /unf («2, ^2)) 1/4 , 

zs mosi l/J r (Af). 

(Nonnegativity) / str and / s t r + /sml ^ a ^ e values in [0, 1]. 

Informally, this regularity lemma decomposes any bounded function into 
a structured part, a small error, and an extremely uniform error. While 
this formulation does not, at first sight, look much like the usual regular- 
ity lemma, it easily implies that result: see |51| . The idea of formulating 
the regularity lemma with an arbitrary growth function T first appears in 
PQ, and is also very useful for generalisations of the regularity lemma to 
hypergraphs. See, for example, |50| . The bound on M turns out to essen- 
tially be an iterated version of the growth function J 7 , with the number of 
iterations being polynomial in 1/e. In applications, one usually selects the 
growth function to be exponential in nature, which then makes M essen- 
tially tower-exponential in 1/e. See [49l [52] for a general discussion of these 
sorts of structure theorems and their applications in combinatorics. See also 
[40] for a related analytical perspective on the regularity lemma. 

As usual, we use O(X) to denote a quantity bounded in magnitude by CX for some 
absolute constant X; if we need C to depend on various parameters, we will indicate this 
by subscripts. Thus for instance O e ,^(l) is a quantity bounded in magnitude by some 
expression C Et jr depending on e,J~. 

2 We use here the expectation notation E a £Af(a) '■= X^aeA /( a ) f° r an y finite non- 
empty set A, where \A\ denotes the cardinality of A. 
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In applications the regularity lemma is often paired with a counting lemma 
that allows one to control various expressions involving the function /. For 
example, one might consider the expression 

^u,v,wevf(u, v)f(v, w)f(w, u), (1.2) 

which counts triangles in V weighted by /. Applying the decomposition 
(jl.ip splits expressions such as (jl.2p into multiple terms (in this instance, 
27 of them). The key fact, which is a slightly non-trivial application of 
the Cauchy-Schwarz inequality, is that the terms involving the box-norm- 
uniform error / UI1 f are negligible if the growth function T is chosen rapidly 
enough. The terms involving the small error /sml are somewhat small, but 
one often has to carefully compare those errors against the main term (which 
only involves / s t r ) in order to get a non-trivial bound on the final expression 
(|1.2p . In particular, one often needs to exploit the positivity of / str and 
/str + /smi to first localise expressions such as (|1.2j> to a small region (such 
as the portion of a graph between a "good" triple Vi,Vj,Vk of cells in the 
partition of V associated to / str ) before one can obtain a useful estimate. 

The graph regularity and counting lemmas can be viewed as the first non- 
trivial member of a hierarchy of hypergraph regularity and counting lemmas, 
see e.g. [HI [TTJ, [T8J, HU [42l [50]. The formulation in [50] is particularly close 
to the formulation given in Theorem 11.11 These lemmas are suitable for 
controlling higher order expressions such as 

^*U,V,W,xEVf(u, V, W)f(v, W, X)f(w, X, U)f(x, U, v). 

Our objective in this paper is to introduce an analogous hierarchy of such 
regularity and counting lemmas (one for each integer s 1), in arithmetic 
situations. Here, the aim is to decompose a function / : [N] — > [0, 1] defined 
on an arithmetic progression [N] := {1, . . . , N} instead of a graph. One is 
interested in counting averages such as 

E n , re [jv]/(n)/(n + r)/(n + 2r), 

which counts 3-term arithmetic progressions weighted by /, as well as higher 
order expressions such as 

K,re[N]f(n)f(n + r)/(n + 2r)/(n + 3r). 

As it turns out, the former average will be best controlled using the s = 1 
regularity and counting lemmas, while the latter requires the s = 2 versions 
of these lemmas. In this paper we shall see several examples of these types 
of applications of the two lemmas. 

The arithmetic regularity lemma. We begin with by formulating 
our regularity lemma. Following the statement we explain the terms used 
here. 

Theorem 1.2 (Arithmetic regularity lemma). Let f : [N] — > [0,1] be a 
function, let s ^ 1 be an integer, let e > 0, and let T : IR + — > M + be a growth 
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function. Then there exists a quantity M = Si£) f(1) and a decomposition 

f — /nil ~t~ /sml /unf 

of f into functions f nih / sim , f uni : [N] ->• [-1, 1] of the following form: 

(/nil structured) f n n is a (T(M), N) -irrational virtual nilsequence of de- 
gree ^ s, complexity ^ M, and scale N; 

(/sml small) / sm i has an L 2 [N] norm of at most e; 

(/unf very uniform) / un f has a U s+l [N] norm of at most 1/J-(M); 

(Nonnegativity) f n n and f n n + / sm i take values in [0, 1]. 

Remark. This result easily implies the recently proven inverse conjecture 
for the Gowers norms (Theorem 12 . X |> . Conversely, this inverse conjecture, 
together with the equidistribution theory of nilsequences, will be the main 
ingredient used to prove Theorem 11.21 

We prove this theorem in We turn now to a discussion of the vari- 
ous concepts used in the above statement. Readers who are interested in 
applications may skip ahead to the end of the section. 

The L 2 [N] norm, used to control / sm ], is simply 

\\f\\ L 2 [N] ^(E^l/tn)! 2 ) 1 / 2 . 

We turn next to the Gowers uniformity norm [/ S+1 [./V], used to control 
/unf- If / : G — > C is a function on a finite additive group G, and k ^ 1 
is an integer, then the Gowers uniformity norm \\f\\u k (G) ^ s defined by the 
formula 

\\f\\ uHG) := {E xM _ hkeG A hl ...A h J( X )) 1/2 \ 

where A^f : G — > C is the multiplicative derivative of / in the direction h, 
defined by the formula 

A h f{x) := f{x + h)J{x)- 

In this paper we will be concerned with functions on [N], which is not 
quite a group. To define the Gowers norms of a function / : [N] —> C, set 
G := Z/NZ for some integer N ^ 2 k N, define a function / : G ->■ C by 
f{x) = f(x) for x = 1, . . . , N and f(x) = otherwise, and set ||/||j/*;[jv] := 
H/lli7 fc (G)/ll-'-[iV] llu fc (G)> where is the indicator function of [N]. It is easy 
to see that this definition is independent of the choice of N, and so for 
definiteness one could take N := 2 k N. Henceforth we shall write simply 
||/||f/fc, rather than ||/||[/fc[jv]) since all Gowers norms will be on [N]. One 
can show that || • \\jjk is indeed a norm for any k ^ 2, though we shall not 
need this here; see [16]. For further discussion of the Gowers norms and 
their relevance to counting additive patterns see [16], [251 §5] or [531 §11]- 

Finally, we turn to the notion of a irrational virtual nilsequence, which is 
the concept that defines the structural component / m i. This is the most com- 
plicated concept, and requires a certain number of preliminary definitions. 
We first need the notion of a filtered nilmanifold. The first two sections of 
[28J may be consulted for a more detailed discussion. 
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Definition 1.3 (Filtered nilmanifold) . Let s ^ 1 be an integer. A filtered 
nilmanifold G/T = {G/T, G,) of degree s consists of the following data: 

A connected, simply-connected nilpotent Lie group G; 

A discrete, cocompact subgroup r of G (thus the quotient space G/T is 
a compact manifold, known as a nilmanifold)] 

A filtration G, = [Gu\ )^ of closed connected subgroups 

G = G( ) = G(i) > G(2) ^ ... 

of G, which are rational in the sense that the subgroups Tu\ := Tfl Gu\ are 
cocompact in Gu\, such that [Gu\,Grj\] C Gu^ for all i,j ^ 0, and such 
that = {id} whenever i > s; 

A Mai 'cev basia^ X = (X\ , . . . , A~ dim ( G ) ) adapted to G, . 

Once a Mal'cev basis has been specified, notions such as the rationality 
of subgroups may be quantified in terms of it. Furthermore one may use a 
Mal'cev basis to define a metric d^yr on the nilmanifold G/T. The results of 
this paper are rather insensitive to the precise metric that one takes, but one 
may proceed for example as in [281 Definition 2.2]. We encourage the reader 
not to think too carefully about the precise definition (or about Mal'cev 
bases in general), but it is certainly important to have some definite metric 
in mind so that one can make sense of notions such as that of a Lipschitz 
function on G/T. 

Observe that every filtered nilmanifold G/T comes with a canonical prob- 
ability Haar measure Hg/T: defined as the unique Borel probability mea- 
sure on G/T that is invariant under the left action of G. We abbreviate 
Ig/t F ( x ) d^G/rix) as J G/r F. 

We will need a quantitative notion of complexity for filtered nilmanifolds, 
though once again, the precise definition is somewhat unimportant. 

Definition 1.4 (Complexity). Let M ^ 1. We say that a filtered nilmani- 
fold G/T = (G/T, G 9 ) has complexity ^ M if the dimension of G, the degree 
of G., and the rationality of the Mal'cev basis X (cf. |28^ Definition 2.4]) 
are bounded by M. 

Heisenberg example. The model example of a degree ^ 2 filtered nilman- 
ifold is the Heisenberg nilmanifold 

G/T := o l i / o l z 
Vooi/ vooi/ 

with the lower central series Gr Q \ = = G and 

G {2) = [G,G] = {H\) 

3 A Mal'cev basis is a basis Xi , . . . , -^dim(G) °f the Lie algebra of G that exponentiates 
to elements of F, such that Xj, . . . , X dim (c) span a Lie algebra ideal for all j ^ i ^ dim(G), 
and X dim ( G )_ dim ( G( . ) ) +1 , . . . ,X dirn(G ) spans the Lie algebra of G (i) for all 1 < i ^ s. For 
a detailed discussion of this concept, see [28j §2]. 
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with Mal'cev basis X = {Xi,X2,X%\ consisting of the matrices 
v /o 1 o\ „ /ooo\„ /ooi 

Ai = 00 ,A9 = 00 1 ,A 3 = 000 

Vooo/ Vooo/ Vooo 

With the definition of filtered nilmanifold in place, the next thing we need 
is the idea of a polynomial sequence. The basic theory of such sequences was 
laid out in Leibman [35], and was extended slightly to general nitrations in 
[28j . An extensive discussion may be found in Section 6 of that paper. 

Definition 1.5 (Polynomial sequence). Let (G/T,G,) be a filtered nilman- 
ifold, with filtration G 9 = (Gu\)^ . A (multidimensional) polynomial se- 
quence adapted to this filtered nilmanifold is a sequence g : TLP — > G for 
some D ^ 1 with the property that 

for all 2 ^ and h\, . . . , hi, n S HP , where d^gin) := g(n + /i)g(n) -1 is the 
derivative of g with respect to the shift h. The space of all such polynomial 
sequences will be denoted poly(Z D , G m ). The space of polynomial sequences 
taking values in T will be denoted poly(Z D ,r.). When D = 1, we refer to 
multidimensional polynomial sequences simply as polynomial sequences. 

Remark. We will be primarily interested in the one-dimensional case 
D = 1, but will need the higher D case in order to establish the counting 
lemma, Theorem II. Hi 

One of the main reasons why we work with polynomial sequences, instead 
of just linear sequences such as n i— > gog™, is that the former forms a group. 

Theorem 1.6 (Lazard-Leibman) . If (G/T,G 9 ) is a filtered nilmanifold and 
D ^ 1 is an integer, then poly(Z £) ,G,) is a group (and poly(Z £) , T,)) is a 
subgroup. 

Proof. See [36] or [28] Proposition 6.2]. □ 

With the concept of a polynomial sequence in hand, it is easy to define a 
polynomial orbit. 

Definition 1.7 (Orbits). Let D,s ^ 1 be integers, and M, A > be pa- 
rameters. A (multidimensional) polynomial orbit of degree ^ s and com- 
plexity ^ M is any functiorfl n i-)- g(n)T from iP — > G/T, where (G/T, G.) 
is a filtered nilmanifold of complexity ^ M, and g € poly(Z £) ,G.) is a 
(multidimensional) polynomial sequence. 

Using the concept of polynomial orbit, we can define the notion of a 
(polynomial) nilsequence, as well as a generalisation which we call a virtual 
nilsequence, in analogy with virtually nilpotent groups (groups with a finite 
index nilpotent subgroup). 



^Strictly speaking, the orbit is the tuple of data (G, T, G/T, G,, n h-s> g(n)r), rather 
than just the sequence n h-> g(n)T, but we shall abuse notation and use the sequence as a 
metonym for the whole orbit. 
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Definition 1.8 (Nilsequences). A (multidimensional, polynomial) nilse- 
quence of degree ^ s and complexity ^ M is any function / : iP C 
of the form f(n) = F(g(n)T), where n i— > g(n)T is a polynomial orbit of de- 
gree ^ s and complexity ^ M, and F : G/T — > C is a function of Lipschitz 
nor n@ at most M. 

Definition 1.9 (Virtual nilsequences). Let N ^ 1. A virtual nilsequence of 
degree ^ s and complexity ^ M at scale A is any function / : [A] — > C of 
the form f(n) = F(g(n)T,n(mod q),n/N), where 1 ^ q ^ M is an integer, 
n I—)- <?(n)r is a polynomial orbit of degree sj s and complexity ^ M, and 
F : G/T x 'L/qL x R — > C is a function of Lipschitz norm at most M. (Here 
we place a metric on G/T x TL/qTL x R in some arbitrary fashion, e.g. by 
embedding TL/qL in R/Z and taking the direct sum of the metrics on the 
three factors.) 

One concept that featured in Theorem 1 1 . 2 1 remains to be defined: that of 
an irrational orbit. The definition is a little technical and takes some setting 
up, and so we defer it and the discussion of some motivating examples 
to Appendix |A] Very roughly speaking, an irrational orbit is one that is 
equidistributed and for which the filtration G. is as small as possible. 

This concludes our attempt to discuss all the concepts involved in the 
arithmetic regularity lemma, Theorem II. 2| we turn now to a statement and 
discussion of the counting lemma. 

Counting lemma. In applications of the arithmetic regularity lemma, 
we will be interested in counting additive patterns such as arithmetic pro- 
gressions or parallelepipeds. To understand the phenomena properly it is 
advantageous to work in a somewhat general setting similar to that taken in 
[201 1211 1221 [29] . In the latter paper one works with a family ^ = (ipi, . . . , ip t ) 
of integer-coefficient linear forms (or equivalently, group homomorphisms) 
ipi , . . . , ipt '■ ^ D <2> and consider expressions such as 

E ne zW(^i (»))•••/(&(*)) (1-3) 

where P is a convex subset of R 13 . Thus, for instance, if counting arithmetic 
progressions, one might use the linear forms 

ipi(p>i,n2) ■■= n\ + (i- l)rt 2 ; i = l,...,k (1.4) 

whilst for counting parallelepipeds one might instead use the linear forms 

ipui,...,u) k (no,ni, ... ,n fc ) := n + wini + . . . + uj k n k ; oj x ,... ,oj k e {0, 1}. 

(1.5) 



^The (inhomogeneous) Lipschitz norm ||-F||u p of a function F : X — > C on a metric 
space X = (X, d) is defined as 

||F|| Lip := sup \F(x)\ + sup \£^tzIM . 

x€X x,y£X:x^y \X — y\ 
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In order to understand the contribution to (|1.3j) coming from the struc- 
tured part / n ii of /, one is soon faced with the question of understanding 
the equidistribution of the orbit 

( 5 (^(n))r,..., 9 (^(n))r) (1.6) 

inside (G/r)*, where n = (m, . . . , njj) ranges over TL D n P. We abbreviate 
this orbit as <7*(n)r*, where : TP — > G* is the polynomial sequence 

5 *(n):=( 5 (^(n)),..., 5 (^(n))). (1.7) 

A very useful model for this question, in which infinite orbits were considered 
in the "linear" case gin) = g n x, was studied by Leibman [39]. His work leads 
one to the following definition. 

Definition 1.10 (The Leibman group). Let Vl/ = (ip±, . . . , ipt) be a collection 
of linear forms ip\ , . . . , ipt '■ Z B ~^ Z. F° r an Y * ^ 1> define to be the 
linear subspace of IR fc spanned by the vectors (ip{(n), ■ ■ ■ ,ipl (n)) for 1 ^ j ^ i 
and n € Z . Given a filtered nilmanifold (G/T, G»), we define the Leibman 
group G* < G* to be the Lie subgroup of G* generated by the elements g v ^ 
for % ^ 1, gi € G(j), and S V&M, with the convention thalU 

9 {vi '-' vt) :=(g v \...,g vt ) 

for each g £ G. Note that G* is normal in G* because G(j) is normal in 
G. We will show in $3] that G* is also a rational subgroup of G', thus 
T* := T* n G* is a discrete cocompact subgroup of G*. 

Examples. Two particular instances of this construction correspond to the 
two lattices (|l-4j) and (|1.5j) above. In the case of arithmetic progressions, 
where \E' is as in (|1.4p . the Leibman group G* is sometimes referred to as 
the Hall-Petresco group HP fe (G,) and has the particularly simple alternative 
description 

HP fc (G.) = G* = {(0(0), ...,g(k-l)):ge poly(G.)}, 

We will prove this fact in In the case of parallelepipeds, where \& is as 
in (|1.5p . the Leibman group G* has been referred to as the Host-Kra cube 
group [29] and it too has an alternative description. See [29] Appendix E] 
for more information: we will not be making use of this particular group 
here. 

Let g G poly(Z, G.) be a polynomial sequence, and let ^ = (ipi, . . . , ip t ) 
be a collection of linear forms ipx, . . . , ipt : Z d — > Z. It turns out (see Lemma 
13. 2 j) that the sequence g^ takes values in G*. More remarkably, the orbit 
(jl.6p is in fact totally equidistributed on G*/r* if g is sufficiently irrational. 
It is this result that we refer to as our counting lemma. 

^We define g v for real v by the formula g v := exp(t> log(gr)), where exp : g — > G is the 
usual exponential map from the Lie algebra q to G (this is a homeomorphism since G is 
nilpotent, connected, and simply connected). 
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Theorem 1.11 (Counting lemma). Let M,D,t,s be integers with 1 ^ D,t, 
s ^ M , let (G/T, G m ) be a degree ^ s filtered nilmanifold of complexity $C M, 
let g : Z — > G be an (A, N) -irrational polynomial sequence adapted to G,, 
let \E r = (tpi, . . . ,ipt) be a collection of linear forms ifii,...,ipt '■ Z D % 
with coefficients of magnitude at most M , and let P be a convex subset of 
[—N,N] D . Then for any Lip schitz function F : {G/T) 1 — >■ C of Lipschitz 
norm at most M, one ha^ 



iV(n)r') = vol(P) f F 

Jofot A G*/r* 



E 



/ ff (o) A G*/r* 

where g(0)^ := (g(0), . . . ,g(0)) G G l and the integral is with respect to the 
probability Haar measure on the coset 

5 (o) A G*/r*, 

viewed as a subnilmanifold of (G/T) 1 , and vol(P) is the Lebesgue measure 
ofP in R D . 

More generally, whenever A ^ lP is a sublattice of index \lP : A] M , 



and no G Z one has 



E 



ne(n +A)nP 



/g(0) A G*/r* 



The counting lemma is, of course, best understood by seeing it in ac- 
tion as we shall do several times later on. The errors oa->oo-,m(N d ) and 
on^-oo-,m(N d ) are negligible in most applications, as A will typically be a 
huge function J-{M) of M, and N can also be taken to be arbitrarily large 
compared to M. 

We remark that one could easily extend the above lemma to control aver- 
ages of virtual irrational nilsequences, rather than just irrational sequences, 
by introducing some additional integrations over the local factors Z/qZ and 
M, but this would require even more notation than is currently being used 
and so we do not describe such an extension here. 

Applications. The proofs of the regularity and counting lemmas oc- 
cupy about half the paper. In the remaining half, we give a number of 
applications of these results to problems in additive combinatorics. The 
scheme of the arguments in all of these cases is similar. First, one applies 
the arithmetic regularity lemma to decompose the relevant function / into 
structured, small, and (very) uniform components / = f n n + f sm \ + / un f. 
Very roughly speaking, these are analysed as follows: 



'We U.SG OA. — \oo\ m(X) to denote a quantity bounded in magnitude by cm(A)X, where 
Cm{A) — ► as A — > oo for fixed M. Similarly for other choices of subscripts. 
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/nil is studied using algebraic properties of nilsequences, particularly the 
counting lemma; 

/ sm i is shown to be negligible, though often (unfortunately) some addi- 
tional algebraic input is required to ensure that this error does not conspire 
to destroy the contribution from / nu ; 

/ un f is easily shown to be negligible using results of "generalised von 
Neumann" type as discussed in §U 

As we shall see, dealing with the error / sm i can cause a certain amount 
of pain. To show that this error is truly negligible, one often has to prove 
that patterns guaranteed by / m i (such as arithmetic progressions) do not 
concentrate on some small set which might be contained in the support of 

/sml' 

We now give specific examples of this paradigm. In £}6] we give a "new" 
proof of Szemeredi's famous theorem on arithmetic progressions. This is 
hardly exciting nowadays, with at least 14 proofs already in the literature 
[21 [31 El H21 [HI [HI S21 S3l SSI SHI ED] as well as (shghtly implicitly) in 
[U [33l [55] . However this proof makes the point that for a certain class of 
problems it suffices to "check the result for nilsequences", and in so doing 
one really sees the structure of the problem. Just as random and structured 
graphs are two obvious classes to test conjectures against in graph theory, 
we would like to raise awareness of nilsequences as potential (and, in certain 
cases such as this one, the only) sources of counterexamples. 

The second application, proven in £}5j is to establish a conjecture of Bergel- 
son, Host and Kra jlj. Here and in the sequel we use the notation X <C ai£ Y 
or Y 3>q, >£ X synonymously with X = O a ^(Y), and similarly for other choice 
of subscripts. 

Theorem 1.12 (Bergelson-Host-Kra conjecture). Let k = 1,2,3 or A, and 
suppose that < a < 1 and e > 0. Then for any N ^ 1 and any subset 
A C [N] of density \A\ ^ aN , one can find ^$> a ,e N values of d G [-N, N] 
such that there are at least (a k — e)N k-term arithmetic progressions in A 
with common difference d. 

Remarks. The claim is trivial for k = 1, and follows from an easy av- 
eraging argument when k = 2. This theorem was established in the case 
k = 3 by the first author in [23]: we give a new proof of this result which 
may be of independent interest. The case k = 4 is new, although a finite 
field analogue of this result previously appeared in lecture notes of the first 
author [24] (reporting on joint work). A counterexample example of Ruzsa 
in the appendix to [4] shows that Theorem 11.121 fails when k ^ 5. 

Finally, in ^TJ we establish a generalisation of a recent result of Gowers 
and Wolf [201 [2~T1 [22] regarding the "true" complexity of a system of linear 
forms. 

Theorem 1.13. Let = (tpi, . . . ,ipt) be a collection of linear forms from 
7lP —> 7L, and let s ^ 1 be an integer such that the polynomials tpl +1 , . . . , ty\ +1 
are linearly independent. Then for any function f : [N] — > C bounded in 
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magnitude by 1 (and defined to be zero outside of [N]) obeying the bound 
\\f\\u s + 1 [N] ^ $ for some 6 > 0, one has 

t 

E ne[N]D Y\f(M n )) = OS-tO^D.tM 1 )- 
i=l 

Remarks. This result was conjectured in [20J, where it was shown that the 
linear independence hypothesis was necessary. The programme in |20|. [21] 
[22] gives an alternate approach to this result that avoids explicit mention 
of nilsequences, and in particular establishes the counterpart to Theorem 
11.131 in finite characteristic; their work also gives a proof of this theorem in 
the case when the Cauchy-Schwarz complexity of the system (see Theorem 
14. ip is at most two, and with better bounds that our result, which is all 
but ineffective. It is worth mentioning that the arguments in |20[ [21] [22] 
also develop several structural decomposition theorems along the lines of 
Theorem II. 2\ but using the language of locally polynomial phases rather 
than nilsequences. 

Relation to previous work. A result closely related to Theorem 
11.21 in the case s = 1 was proved by Bourgain as long ago as 1989 [6]. 
In that paper, the decomposition was applied to give a different proof of 
Roth's theorem, that is to say Szemeredi's theorem for 3-term progressions. 
A different take on this result was supplied by the first author in [23], where 
the application to the case k = 3 of the Bergelson-Host-Kra conjecture was 
noted. In that same paper a construction of Gowers |14] was modified to 
show that any application of the arithmetic regularity lemma must lead to 
awful (tower-type) bounds; the same kind of construction would show that 
the cases s ^ 2 of Theorem 11.21 lead to tower-type bounds as well. Ie@ [23] 
the analogue of the case s = 2 of Theorem 11.21 in a finite field setting was 
stated, proved, and used to deduce the finite field analogue of the Bergelson- 
Host-Kra conjecture in the case k = 4. In that same paper the present work 
was promised (as reference [22]) at "some future juncture" . Four years later 
we have reached that juncture and we apologise for the delay. We note, 
however, that until the very recent resolution of the inverse conjectures for 
the Gowers norms [31} [32] many of our results would have been conditional; 
furthermore, we are heavily dependent on our work [28], which had not been 
envisaged when the earlier promise was made. 

In the meantime a greater general understanding of decomposition theo- 
rems of this type has developed through the work of Gowers |19j . Reingold- 
Trevisan-Tulsiani-Vadhan [33] , and Gowers- Wolf [20j EO [22] ; see also the 
survey [52] of the second author. While Theorem 11.21 is related to several of 
these general decomposition theorems, it also relies upon specific structure 



The relevant part of these lecture notes by the first author reported on joint work of 
the two of us. 
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of nilmanifolds. In any case it seems appropriate, in this volume, to give a 
proof using the "energy increment argument" pioneered by Szemeredi. 

Acknowledgments. BG was, while this work was being carried out, 
a fellow at the Radcliffe Institute at Harvard. He is very happy to thank 
the Institute for proving excellent working conditions. TT is supported by 
a grant from the MacArthur Foundation, by NSF grant DMS-0649473, and 
by the NSF Waterman award. 

2. Proof of the arithmetic regularity lemma 

We now prove Theorem 11.21 The proof proceeds in two main stages. 
Firstly, we establish a "non-irrational regularity lemma", which establishes 
a weaker version of Theorem [L2] in which the structured component f n n is a 
polynomial nilsequence, but one which is not assumed to be irrational. The 
main tool here is the inverse conjecture GI(s) for the Gowers norms [32], 
combined with the energy incrementation argument that appears in proofs 
of the graph regularity lemma. In the second stage, we upgrade this weaker 
regularity lemma to the full regularity lemma by converting the nilsequence 
to a irrational nilsequence. The main tool here is a dimension reduction 
argument and a factorisation of nilsequences similar to that appearing in 
[28]. 

The non-irrational regularity lemma. We begin the first stage 
of the argument. As mentioned above, the key ingredient is the following 
result. 

Theorem 2.1 (GI(s)). Let s ^ 1, and suppose that f : [N] — > C is a 

function bounded in magnitude by 1 such that \\f\\u s+1 [N] ^ ^ f or some 
5 > 0. Then there is a degree ^ s polynomial nilsequence tp : Z — )• C of 
complexity S! $(1) such that \{f,ip) l 2 [n]\ ^s.a 1> where 

(/,^)z 2 [at] ■= ^ne[N]f(n)i(}(ri) 
is the usual inner product. 

Remark. The difficulty of this conjecture increases with s. The case 
s = 1 easily follows from classical harmonic analysis. The case s = 2 was 
established by the authors in [26J, building upon the breakthrough paper of 
Gowers |15j . The case s = 3 was recently established by the authors and 
Ziegler in |31| . and the general case will appear in the forthcoming paper 
[32] by the authors and Ziegler. 

For technical reasons, it is convenient to replace the notion of a degree 
^ s polynomial nilsequence by a slightly different concept. The following 
definition is not required beyond the end of the proof of Proposition 12.71 

Definition 2.2 (s-measurability). Let <3? : K + — > M + be a growth function 
and s ^ 1. A subset E C [N] is said to be s-measurable with growth function 
<3? if for every M ^ 1, there exists a degree ^ s polynomial nilsequence 
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tp : Z — > [0, 1] of complexity ^ $(M) such that 

\W-1e\\i?[n] < 



An example of a 1-measurable function would be a regular Bohr set, as 
introduced in [7] and discussed further in \2Q\ §2]. We will not need Bohr 
sets elsewhere in this paper, so we shall not dwell any longer on this example. 
However the reader will see ideas related to the basic theory of those sets in 
the proof of Corollary 12.31 below. 

We make the simple but crucial observation that if E, F are s-measurable 
with some growth functions <£, <£>' respectively, then boolean combinations of 
E, F such as E(~)F, EUF, or [iV]\i£ are also s-measurable with some growth 
function depending on Underlying this, of course, is that fact that 

the product and sum of two nilsequences is also a nilsequence, and hence 
the set of nilsequences form a kind of algebra (graded by complexity). The 
role of algebraic structure of this kind was brought to the fore in the work 
of Gowers [H] cited above. 

Theorem 12.11 then implies 

Corollary 2.3 (Alternate formulation of GI(s)). Let s ^ 1, and suppose 
that f : [N] — > [—1, 1] is such that \\f\\u s + 1 [N] ^ $ f or some 5 > 0. Then there 
exists a growth function & s> s depending only on s,5, and an s-measurable 
set E C N with growth function $ s g, such that 



Proof. We allow implied constants to depend on s, 5. By Theorem[2Tj there 
exists a degree ^ s polynomial nilsequence ip of complexity 0(1) such that 



By taking real and imaginary parts of ■0, and then positive and negative 
parts, and rescaling, we may assume without loss of generality that tp takes 
values in [0, 1]. By Pubini's theorem, we then have 



where E t := {n G [N] : ip(n) ^ t}. We thus see that there is a subset 
C [0, 1] of Lebesgue measure 3> 1 such that 



uniformly for all t £ O. 

It remains to show that at least on^ of the Et is s-measurable with respect 
to a suitable growth function. For any t G R, we consider the maximal 
function 



^n&\N}f{ n ) l E{n)\ > s>5 I. 




E ne[N] f(n)l Et (n)\ » 1 



M(t) :=sup— -\{ne [N] : \^(n)-t\ <r}|. 
r >o 2r N 



Here we are, in some sense, finding a "regular" nil-Bohr set {n £ [N] : ip( n ) ^ i}, 
that is to say one rather insensitive to small changes in the value of t. A similar idea also 
appears in [441 Claim 2.2]. 
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From the Hardy-Littlewood maximal inequality or the Besicovitch covering 
lemma we have that the set {t £ M : M(t) > A} has Lebesgue measure 
0(1/A) for any A > 0. Thus, we can find t G such that M(t) = 0(1). 
Fixing such a t, we then see that 

\{n E [N] : \ip(n) - 1\ < r}\ <C rN 

for all r > 0. As a consequence, for any r > 0, one can then approximate 
l Et to within 0(y/r) in L 2 [N] norm by a Lipschitz function of ip with Lips- 
chitz norm 0(1 /r). This implies that i£ t is s-measurable with some growth 
function <1> depending only on s, 5, and the claim follows. □ 

We rephrase this fact in terms of conditional expectations. The following 
definition, like Definition 12.21 will only be needed until the end of the proof 
of Proposition 12.71 

Definition 2.4 (s-factors). An s-f actor B of complexity ^ M and growth 
function $ is a partition of [N] into at most M sets (or cells) E±, ... , E m 
which are s-measurable of growth function <£. Given an s-factor B and 
a function / : [N] — > C, we define the conditional expectation ~E(f\B) : 
[N] — > C of / with respect to the s-factor to be the function which equals 
E ne e j ./(n) on each cell of the partition. We define the index or energy £(B) 
of the s-factor B relative to / to be the quantity ||E(/|£>)|| 2 2 [jv]- 

An s-factor B' is said to refine another B if every cell of B' is contained 
in a cell of B. 

Corollary 2.5 (Lack of uniformity implies energy increment). Let s ^ 1, 
let B be an s-factor of complexity ^ M and some growth function <]?, and 
suppose that f : [N] — > [0, 1] is such that \\f — E(/|0)||{78+in\n ^ 6 for some 
5 > 0. Then there exists a refinement B' of B of complexity ^ 2M and some 
growth function depending on s,5,M,<&, such that 

£(B') - £(B) » M 1. 

Proof. By Corollary 12.31 we can h n d an s-measurable set E with a growth 
function depending on s, 5 such that 

\(f-E(f\B),l E ) L2[N] \ » M 1 (2.1) 

Now let B' be the partition generated by B and E; then B' clearly has 
complexity ^ 2M and a growth function depending on s,S,M, 3>. Since lg 
is measurable with respect to the partition B' (that is to say it is constant 
on each cell of this partition), we can rewrite the left-hand side of (12. lj) as 

\(E(f\B')-E(f\B),l E ) L 2 [N] \ 

and hence by the Cauchy-Schwarz inequality 

||E(/|0O-E(/|0)|| L2[JV] » Si5 l. 

The claim then follows from Pythagoras' theorem. □ 
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We can iterate this to obtain a weak regularity lemma, analogous to the 
weak graph regularity lemma of Frieze and Kannan [13] . 

Corollary 2.6. Let s ^ 1, let B be an s-f actor of complexity ^ M and 
some growth function <I> ; let f : [N] — > [0,1], and let e > 0. Then there 
exists a refinement B' ofB of complexity S: M,siX) and some growth function 
depending on s, e, M, such that 

\\f-E(f\&)\\ v . + i [N] ^6. (2-2) 

Proof. We define a sequence of successively more refined factors B', starting 
with B' := B. If (|2.2p already holds then we are done, so suppose that 
this is not the case. Then by Corollary 12.51 we can find a refinement B" 
of complexity St M,eiX) and some growth function depending on s,e,M, <3? 
whose energy is larger than that of B' by a factor 3> S)£ 1. On the other 
hand, the energy clearly ranges between and 1. Thus after replacing B' 
with B" and iterating this algorithm at most Si£ (l) times we obtain the 
claim. □ 

One final iteration then gives the full non-irrational regularity lemma. 

Proposition 2.7. Let f : [N] -> [0,1], let s ^ 1, let £ > 0, and let T : 

M + —7- M + be a growth function. Then there exists a quantity M = S)£i j-(1) 
and a decomposition 

f /nil /sml /unf 

of f into functions f ni \, f, m f ■ [N] — > [—1,1] such that: 

(/nil structured) f n n equals a degree ^ s polynomial nilsequence of com- 
plexity ^ M . 

(/sml Small) 1 1 /sml 1 1 L2 [AT] ^ £• 

(/unf wery uniform) ||/nil||t/«+i[JV] ^ l/T{M). 
(Nonnegativity) f ni \ and f ni \ + f sm \ take values in [0, 1]. 

Proof. We need a growth function T : M + — > M + , somewhat more rapidly 
growing than T in manner that depends on J 7 , s, e. We will specify the 
exact requirements we have of it later. We then define a sequence 1 = Mq ^ 
Mi ^ ... by setting M := 1 and M i+1 := T(Mi). 

Applying Corollary 12.61 repeatedly, we may find for each % ^ an s-factor 
Bi of complexity Si Mi(l) and a growth function depending on s, Mi, such 
that each Bi refines B{—i, and such that 

\\f-E(f\Bi)\\ Ua+HN] ^l/Mi 

for all i ^ 0. 

By Pythagoras' theorem, the energies £(B{) are non-decreasing, and also 
range between and 1. Thus by the pigeonhole principle, one can find 
i = O e (l) such that 

£(B i+1 )-£(Bi)^e 2 /4, 
which by Pythagoras' theorem again is equivalent to 
\\E(f\B i+l )-E(f\Bi)\\ L 2 [N] <e/2. 
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Meanwhile, as Bi is an s-f actor and / is bounded, we can find a degree ^ s 
polynomial nilsequence / n ii : [N] — >• R of complexity O s ,Mi(T) such that 

||E(/|Bi) - hn\\ L z[N] < e/2. 

Since E(/|Bj) ranges in [0, 1], we may retract / nu to [0, 1] also (note that this 
does not increase the complexity of / n ;i). If we then set / U nf := / — E(/|i3j + i) 
and / sm i := K(f\Bi+i) — f n \\, we obtain the claim. □ 

Remark. The application of the Hardy-Littlewood maximal inequality in 
the proof of Corollary 12.31 makes for a reasonably tidy argument. A more 
direct approach would be to carve up [N] into approximate level sets of 
nilsequences, and then to approximate the projections onto the factors thus 
defined by nilsequences using the Weierstrass approximation theorem. There 
are a number of technicalities involved in this approach, chiefly involving the 
need to choose the approximate level sets randomly. This kind of argument 
was employed, in a closely related context, in [251 Chapter 7]. One can also 
use utilise arguments based on the Hahn-Banach theorem instead; see |19j . 
P], and [201 EH [22]. 

Obtaining irrationality. Our task now is to replace the nilsequence 
/ n ii appearing in Proposition 12.71 with a highly "irrational" nilsequence as 
advertised in the statement of our main theorem, Theorem II .21 It turns out 
to be sufficient to establish the following claim. 

Proposition 2.8. Let s,Mq ^ 1, let J 7 be a growth function, and let f : 
Z — > [0,1] be a degree s nilsequence of complexity ^ Mq. Then there 
exists an M = Sj Mo,f(1)> such that f (when restricted to [N]) is also a 
(J 7 ( M), N) -irrational degree ^ s virtual nilsequence of complexity ^ M at 
scale N . 

To establish Theorem 11.21 from this and Proposition 12.71 one first applies 
the latter result with T replaced by a much more rapid growth function 
J 7 ', and then one applies Proposition 12.81 to the structured component / n ii 
obtained in Theorem 12.61 

It remains to prove Proposition 12.81 Let s, Mq, J-,ip be as in that propo- 
sition. By definition, we have ifj = Fo(go(n)T) for some degree ^ s fil- 
tered nilmanifold (G/T,G,) of complexity ^ Mq, a polynomial sequence 
go G poly(Z, G.), and a function Fq : G/T —> C which has a Lipschitz 
norm of at most Mq. Since ip takes values in [0, 1], we may assume without 
loss of generality that Fq is real, and by replacing Fq with the retraction 
max(min(Fo, 1), 0) to [0,1] if necessary, we may assume that Fq also takes 
values in [0, 1]. Henceforth (G/T,G,), go, and Fq are fixed. 

Factorisation results. One of the main results of our paper [28] was 
a decomposition of an arbitrary polynomial nilsequence g on G/T into a 
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produciF^ Bg ,r f. where f3 is "smooth", 7 is "rational", and g'{n)T is equidis- 
tributed inside some possibly smaller nilmanifold G' /V . We need a similar 
result here, but with g' having the somewhat stronger property of being ir- 
rational that we mentioned in the introduction. The notion of irrationality 
is discussed in more detail in Appendix lAl 

We will be also using the notions of smooth and rational polynomial 
sequences from [28]. Again, the basic definitions and properties of these 
concepts are recalled in Appendix lAl 

Define a complexity ^ M subnilmanifold of (G/T, G,) to be a degree ^ s 
filtered nilmanifold (G'/T',G' 9 ) of complexity ^ M, where each subgroup 
Gva in the filtration G' 9 is a rational subgroup of the associated subgroup 
of complexity ^ M, V = G' (1 T, and each element of the Mal'cev 
basis of (G'/T',G' 9 ) is a rational linear combination of the Mal'cev basis of 
(Cr/r,G.), where the coefficients all have height ^ M. We define the total 
dimension of such a nilmanifold to be the quantity ]T^ =0 dim(G'^); this is 
also the dimension of poly(Z, G.) (thanks to the Taylor series expansion, 
Lemma IA.1|) . 

We make the easy remark that if (G' /V , G' m ) is a complexity ^ M subnil- 
manfiold of (G/T,G.) for some M > M , and (G" /V" ,G'i) is a complexity 

M subnilmanifold of (G'/V, G',), then (G"/T", G'i) is a complexity O m (1) 
subnilmanifold of (G/T,G»). 

Our first lemma is very similar in form to |28[ Lemma 7.9]. 

Lemma 2.9 (Initial factorisation). Let (G'/T',G' 9 ) be a complexity ^ M 
subnilmanifold of (G/T, G 9 ) for some M ^ Mq, let g' G poly(Z, G' m ), and let 
A > and N ^ 1 . Then at least one of the following statements hold: 

(Irrationality) g' is (A, N) -irrational in (G' /V ,G' m ). 

(Dimension reduction) There exists a factorisation 

9 = /V'7 

where f3 G poly(Z,G".) is (Om,a(^), N)- smooth, g" G poly(Z,G^') takes val- 
ues in a subnilmanifold (G"/T",G'^) of (G'/T',G' 9 ) of strictly smaller total 
dimension and of complexity Om,/i(1), and 7 G poly(Z,G^) is Om,a(1)- 
rational. 

Proof. To make this proof a little more readable, we drop one dash from 
every expression. Thus g' becomes g, G" becomes G' , and so on. Suppose 
that g is not (^4, AQ-irrational. Recall (see Lemma lA.lj) that g has a Taylor 
expansion that we may write in the form 

„/„\ „ _(l)_(2) S-s) 

9\ n ) — 9q9\ 92 ■ ■ ■ 9s , 
where gi G G^ for each i. It follows from Lemma IA.7I that for some i, 
1 ^ i ^ s, we can factorise 

9i = foghi, 



^In our paper [28] the letter e was used for a smooth nilsequence, but we use /3 here 
to avoid conflict with various uses of e to denote a small positive real number. 
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where g[ € G^ lies in the kernel of some horizontal character : — > M 
of complexity Oa,m(1), 7i 6 GV^ is Oa,m (l)-rational in the sense that j™ £ 
Fu\ for some m = Oa,m(^), and ft € Gvj) has distance Oa,m0-/N 1 ) from 
the origin. 

We now divide into two cases, depending on whether i > 1 or i = 1. First 
suppose that i > 1. Then the Taylor expansion of g reads, with an obvious 
notation, 

g(ji) = 3<i(n)(ft5f-7i)^ 9>i(n). 
By commutating all the fts to the left and all the 7jS to the right, and 
using the group properties of polynomial sequences (Theorem II .6p . one can 
rewrite this as 

g (n) = /9pV(n) 7 P 

where 

/(") 

9 (ji) := g<i(n)g i UJ g>i(n) 
and g>i(n) is another polynomial sequence taking values in Gv i+1 ). Ob- 
serve that g' is then a polynomial sequence adapted to the subnilmani- 
fold (G'/V, G' m ), where G'/F' = G/Y and G' {j) = G (j) for j / i, but 
G',ft = ker(^). This is indeed a subnilmanifold, with complexity Oa,a/(1); 
note that (G^)^ is a filtration, thanks to our insistence in the definition 
of i-horizontal character (cf. Definition IA.6j) that [G(j\,Gu^j\] C ker(^) for 

all ^ j ^ i. Meanwhile, fi\ is a (O^aKI), iV)-smooth sequence and 7i 
is a J 4,A/(l)-i'ational sequence, so we have the desired factorisation in the 
i > 1 case. 

When i = 1, the above argument does not quite work, because G',^ would 
be distinct from G'^ and would thus not qualify as a filtration. But this 
can be easily remedied by performing an additional factorisation 

where /3o G G' is a distance Oa,m(1) from the identity, and g' Q lies in the 
kernel of £j . This leads to a factorisation of the form 

g(n) = ftflV (n) 7 ? 

where 

g'{n)= g y i n gi >1 {n) 
and is a polynomial sequence taking values in G', 2 y One then argues as 
before, but now one sets both G'L\ and G'L-, equal to the kernel of £[. □ 

We can iterate the above lemma to obtain the following result, which 
is analogous to [281 Theorem 1.19]. Apart from dealing with irrationality 
rather than equidistribution, the following result is somewhat different to 
that just cited in that one requires an arbitrary (rather than polynomial) 
growth function, but one does not (of course) need polynomial complexity 
bounds. A variant of [281 Theorem 1.19] was also given in [311 Theorem 4.2]. 
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Lemma 2.10 (Complete factorisation). Let (G/T,G») be a degree ^ s fil- 
tered nilmanifold of complexity ^ Mq, and let g G poly(Z, G.). For any 
growth function T' , we can find a quantity Mq ^ M ^ 0^/^/(1) and a 
factorisation g = Pg'^f where: 

(3 G poly(Z,G.) is (O M (1), iV)- smooth; 

g' G poly(Z, G») is (J 7 ' (M) , N) -irrational in a subnilmanifold (G'/T',G' t ) 
of (G/T, G,) of complexity Om{X), o,nd 
7 G poly(Z, G,) is Om(1) -periodic. 

Proof. We use an iterative argument, setting /3 = 7 = id, g' = g, M = Mq, 
and (G'/T',G',) = (G/I\G.) to begin with. In particular, (G',T',G',) is 
initially a subnilmanifold of (G/T, G.) of complexity Oa/(1). If g' is ^'(M)- 
equidistributed in (G'/V, G' 9 ) then we are done; otherwise, by Lemma f2.9l we 
may factorise g' = (3'g"^/' where c' is (Ojf'(m)(1)> AQ-smooth, 7 is 0_f'(a/)(1)- 
periodic, and g" now takes values in a subnilmanifold (G" /T", G'„') of (G'/V, 
G' 9 ) of complexity Ojr/( M )(l) and smaller total dimension than (G'/T',G' 9 ). 
We then replace /? by p?, 7 by 7 '7, sf by <?", (G'/r',G'.) by (G"/T",G'i), 
and increase M to a quantity of the form Ojri^^(\), using Lemma IA.4I to 
conclude that the new (5 is smooth and the new 7 is rational. We then iterate 
this process. Since the total dimension of (G/T, G.) is initially Oa/ (1), this 
process can iterate at most Om (1) times, and the claim follows. □ 

With this lemma we can now establish Proposition 12 .81 and hence Theorem 
11.21 Let F' be a rapid growth function (depending on e, Mq, F) to be chosen 
later. We apply Lemma l2.10l obtaining some M with Mq ^ M ^ Om .T'(T) 
and a factorisation 

^(n) = F((3(n)g'(nMn)T) 

with f3, g' and 7 having the properties described in that lemma. 

The sequence 7 is Om (^-rational and so, by Lemma lA. 4\ the orbit n 1— > 
7(n)T is periodic with some period q = Om(1), and thus ^(n)T depends 
only on n mod q. 

For each n, the rationality of 7(n) ensures that ")(n)T intersects T in a 
subgroup of T of index OmO-). Since there are only Om (1) different possible 
values of 7(n)r, we may thus find a subgroup V of V of index 0^/(1) such 
that r' C 7(n)r for all n. 

We can thus express tp as a virtual nilsequence 

ip(n) = F(g'(n)T',n mod q,n/N) 

where F : G/T' x Z/gZ x R is defined by the formula 

F(x,a,y) := F((3(Ny)x 7 (d)T) 

whenever y G -^Z and by Lipschitz extension to all y G M. where a is any 
integer with a = a mod q, and x is any element of G such that air' = x. 
One easily verifies that F is well-defined and has a Lipschitz norm of O^f (1). 
Also, since g' was already (T(M), A r )-irrational in G/T, and V has index 
Om(1) in T, we see that g' is (3>m F(M), AQ-irrational in G/T' . Proposition 
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12.81 now follows by replacing M by a suitable quantity of the form Om(1), 
and choosing F' sufficiently rapidly growing depending on T . 

3. Proof of the counting lemma 

The purpose of this section is to prove the counting lemma, Theorem ll.lll 
We begin by recalling from the introduction the definition of the Leibman 
group Cr*. 

Definition 3.1 (The Leibman group). Let = (ipi, . . . ,ipt) be a collection 
of linear forms ipi,...,ipt : TP — > Z. For any i ^ 1, define V&M to be the 
linear subspace of R spanned by the vectors {ip\ (n) , . . . , ifjf (n)) for 1 ^ j ^ % 
and n G TP . Given a filtered nilmanifold (G/T,G 9 ), we define the Leibman 
group G^ < G* to be the Lie subgroup of G t generated by the elements g"^ 
for i ^ 1, gi G G(j), and G VP™, with the convention that if u = (yx, . . . , Vf) 
then 

/:= (g v \...,gv*). 

Now might be a good time to remark explicitly that we have introduced 
a slightly vulgar convention that we hope will help the reader follow this 
section and other parts of the paper. Bold font letters such as n G 
denote D-dimensional vectors, whilst arrows such as v G K denote i-vectors. 
Occasionally we shall write mi := dim(^W). 

When reading this section, it might be found helpful to have a running 
example in mind. We will take as an illustrative example the case D = 2, 
t = 4 and ^ = (tpi, . . . ,tpi), where ipi(n) = n\ + in2 for i = 0,1,2,3. 
The system of course, defines a 4-term arithmetic progression. As we 
remarked in the introduction the corresponding Leibman group G* is also 
known as the Hall-Petresco group HP 4 (G). The reader will easily confirm 
that in this case we have 

*W =1(1,1,1,1)0 8(0,1,2,3) 

and 

f [2] = M(l, 1, 1, 1) M(0, 1, 2, 3) © M(0, 0, 1, 3) 

and 

^ = 1, 1, 1) © R(0, 1, 2, 3) © R(0, 0, 1,3) + R(0, 0, 0, 1) = M 4 . 

Some work must be done before we can describe G* = HP 4 (G) in a pleasant 
way. However we can already establish the following lemma, whose state- 
ment and proof go some way towards explaining the introduction of the 
Leibman group. 

Lemma 3.2. Let = (ipi , . . . , ipt ) be a collection of linear forms ifti , . . . , tpt '■ 
TP — > Z. Suppose that (G/T,G,) is a filtered nilmanifold and that g G 
poly(Z, G») is a polynomial sequence. Then the sequence : TP — > G* 
defined by 5*(n) := (g(-(/>i(n)), . . . , ^(^(n))) takes values in G*. 
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Proof. The sequence g(n) has a (unique) Taylor expansion 

( n ) (") 

g{n) = gogi ■ --ga 

with gi 6 Gu\ for all i (see Lemma lA.ip . Substituting in, it follows that 

^(nH^! 



i=o 



and it is immediate from the definition that each element in this product 
lies in G*. □ 

The counting lemma, whose proof is the main objective of this section, 
was stated as Theorem 11.111 Essentially, it states that <7*(n)r* is equidis- 
tributed in G*/r* as n ranges over "nice" subsets of "big" lattices, provided 
that the original sequence g is suitably irrational. We will recall what that 
means in due course, but our first task is to develop the basic theory of the 
Leibman group G*. At the moment, for example, we have not established 
that is a connected Lie subgroup of G t or that G*/r* has the structure 
of a filtered nilmanifold. Nor have we developed tools for calculating inside 
this group. 

Basic facts about the Leibman group and nilmanifold. We can 
endow R* with the structure of a commutative algebra over IR by using the 
pointwise product 

x.y = (xiyi, . . .,x t y t ) 

and setting 1 = (1, . . . , 1) to be the multiplicative identity. With this algebra 
structure, one can view the spaces VP^ defined in Definition 1 1 . 1 1 as the span 
of the powers ^(n)- 7 for n G Z D and 1 j ^ i, where we view $ as a 
homomorphism from Z^ to Z*. We have the following alternate definition 
of the 

Lemma 3.3 (Depolarisation). $1*1 is the span of the products 

$(n 1 )...$(n i ), 
where 1 ^ j ^ i and ni , . . . , 6 iP . 

Proof. Clearly $M is contained in this span. To establish the reverse con- 
tainment, we observe the elementary depolarisation identity 

$(n 1 )...$(n J ) = ^^ Yl (-l) H *(wini + ... + ^n i ) i 

where uj = (wi, . . . , ojj) and |u>| := oji + . . . + Uj, and the claim follows. □ 

As an immediate consequence we have 

Corollary 3.4 (Filtration property). For any i,j ^ 0, we have $M • $^'1 C 
^[i+il. 
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Let (G/r, G.) be a degree ^ s filtered nilmanifold. From Definition 11,101 
the Leibman group G* is the subgroup of G l generated by the group ele- 
ments g^ for i ^ 1, Vi £ iff®, and g,i £ G(j)- For any iq ^ 1, let G^s be the 

subgroup of G* generated by those g\ % with i ^ io, Vi € iff®, gi € Gu\, with 
the convention that GX% := G*. 

Lemma 3.5 (Filtration property for G*). G* := (G*j)^ is a filtration 
on G . In other words, the G^ are nested with [G^,GF.J C /or a// 

Proof. It suffices to check that if Gj E ^(i)> 5i ^ ^(i)' ^* = ■ • • > e ^ 
and vj = (vji, . . . ,Vjt) € then [ff"*,^^] £ G* + ^. But this follows from 
the Baker-Campbell-Hausdorff formula (see (|C2[) ). the filtration property 
of G (i) and Corollary E3 □ 

The spaces form a flag 

^ < . . . < * M < R* 

of subspaces which are rational (i.e. they can be defined over Q). From 
a greedy algorithm (and clearing denominators) we may thus find a basis 
ui,..., ?7 m3 S Vl/M with the following properties: 

(Integrality) vi, . . . , v ms all lie in Z*; 

(Partial span) For every 1 ^ i ^ s, v\, . . . , v mi span iff®; 

(Row echelon form) For each 1 ^ j m s , there exists lj, 1 lj ^ t, 
such that t/j has a non-zero Zj coordinate, but such that vy has a zero in- 
coordinate for all j < j' ^ m s . 

For instance, the basis 

n := (1,1,1,1); ^ := (0,1,2,3); i% := (0,0, 1,3); v A := (0, 0, 0, 1) 

we implicitly gave above for our running example is already in this form. 

Fix such a basis. For each basis element Vj, we can define the degree 
deg(vj) of that element to be the first i for which j m;, thus deg(u 7 ) is an 
integer between 1 and s, and ?/j G vj/^sfaj)]. 

Observe that an arbitrary element of G* can be expressed as a product of 
finitely many elements of the form g v - ] for ^ j ^ m s and gj G G^eg^.)). By 
many applications^ of the Baker-Campbell-Hausdorff formula (see (jC.ip ) 
and Lemma 13.51 we can now express any element of G* in the form 

11 P-i) 

^Indeed, one uses (|C.1[) and Lemma 13.51 to extract out and collects all terms with 
degree deg(«,) = 1, leaving only terms with base gj in Gn)- Then one extracts out those 
terms with degree 2 (merging them with the i = 1 terms as necessary), leaving only terms 
with base in (?(3). Continuing this process gives the desired factorisation. 
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where gj E G(deg(i?j)) for all 1 ^ j ^ m s . 

Thus, in our running example, we have the explicit description of G* = 
HP 4 (G) as 

{(go, 9091, 909192, gogfghs) : go e G {0) ,gi e G {1) ,g 2 e G (2) ,53 e G (3) }. 

Note that from results on the Taylor expansion (see Lemma lA.lj) this group 
may also be identified as 

{(jg(P),g(i),g(2),g(3)) : g G poly(Z,G.)}. 

The group nature of HP 4 (G) is then easily deduced from Theorem 11.61 but 
this presentation is somewhat specific to the Hall-Petresco case and we shall 
not require it further. 

From the row-echelon form one can verify inductively that the represen- 
tation (13. ip is unique (this can be seen clearly by working with the Hall- 
Petresco example presented above). This gives G* the structure of a con- 
nected, simply connected Lie group, with dimension 

s 

dim(G*) = ^dim(G W )(dim(^) - dim^" 1 ])) (3.2) 
i=i 

(with the convention that VpPl is trivial). A similar argument also shows 
that every element of GX \ can be expressed uniquely in the form (|3.ip . 
where now gj is constrained to lie in £?( ma 3c(deg(tj-),*o)) rather than G((jeg(u,))- 
In particular, by reading off the coefficients gj one at a time, this implies 
the pleasant identity 

G*=G*n(G w ) fc . (3.3) 

Remark. From Taylor expansion (see Lemma IA.1|) we see that the se- 
quence in (jl.7p lies in poly(Z, G*). While we do not directly use this 
fact here, it may help explain why the filtration G* will plays a prominent 
role in the proof of the counting lemma that we will shortly come to. 

Recall that we normalised the basis vectors Vj £ Z to have integer coeffi- 
cients. As a consequence, we see that if the gj are in T, then the expression 
(13. ip lies in T k . From this (and many applications of Lemma 13 . 5|) we see 
that rj^ := T k n G?g is cocompact in Gf^ for each i, and so (G*/r*,G*) 
is a filtered nilmanifold. Furthermore, the same argument shows that the 
GJ^ are rational subgroups of G k and so (G*/r*,G*) is a subnilmanifold 

of (G k /T k ,G k ). 

The counting lemma: preliminary manoeuvres. Now that we have 
verified that G*/r* is indeed a nilmanifold, we can begin the proof of 
Theorem 11.111 

We begin with some easy reductions. First, observe that for fixed M, there 
are only finitely many possibilities for s,D,t,fy, and (up to isomorphism) 
there are only finitely many possibilities for (G/T, G.) and T. Thus it will 
suffice to establish the result for a single choice of s, D, t, ^, (G/T, G,), with 
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the bounds depending on these quantities. Hence, we fix these quantities 
and allow all implicit constants to depend on these quantities (thus, in this 
section, we will not explicitly subscript out 0(1) quantities). 

Similarly, because the space of Lipschitz functions with Lipschitz norm 
0(1) is precompact in the uniform topology (by the Arzela-Ascoli theorem), 
it suffices to prove the desired bound for each fixed F, as the uniformity in 
F then follows from an easy approximation argument. Thus we fix F and 
allow all quantities to depend on F. 

Next, we observe that we may normalise g(0) = id. Indeed, we may 
factorise g(0) = C070 where dc(co,id) = O(l) and 70 G T. Factorising, we 
obtain 

g(n) = c g'(n)-f 

where g'(n) := cqJq^q 1 g(n)jo). Note that g'(0) = id and that Taylor 
coefficients of g' are given by g\ = 7 [1 ~ 1 <?270) and so g' is also (A, iV)-irrational. 
It is then an easy matter to see that Theorem 11.111 for g and F follows from 
Theorem 11.111 for g' and for the shifted function F'(x) := F(cqx), which is 
still Lipschitz with norm 0(1). 

Note that we may assume that A and iV are large, as the claim is trivial 
otherwise. 

Equidistribution in the Leibman group. Let us recall what we 
are trying to prove. In the counting lemma, Theorem II. Ill our aim is to 
show that if g(n) is suitably irrational then the orbit (g^(n)) ng ( no+y \)nP is 
equidistributed on the Leibman nilmanifold G*/r*. We shall proceed by 
contradiction, supposing this orbit is not equidistributed and deducing that 
g{n) could not have been irrational. The reader should recall the definition 
of irrational in this context: it is given in Definition IA. 61 

Our main tool will be a mild generalisation of the "multiparameter Leib- 
man criterion", which is [28, Theorem 8.6]. Here is the statement we shall 
use. 

Theorem 3.6. Suppose that (G/T, G 9 ) is a filtered nilmanifold of complexity 
^ M and that g G poly(Z' D , G 9 ) is a polynomial sequence for some D ^ M . 
Suppose that A C Z D is a lattice of index ^ M, that no G 7lP has magnitude 
M, and that P C [— N, N] D is a convex body. Suppose that 5 > 0, and 
that 



for some Lipschitz function F : G/T — > C. Then there is a nontrivial 
homomorphism r/ : G — > R which vanishes on T, has complexity Om(1) and 
such that 



Remarks. This differs from [28^ Theorem 8.6] in several insubstantial 
ways. On the one hand we have no concern here with the polynomial bounds 
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that were important in that setting. However, we are dealing here with a 
sublattice A C iP rather than "L D itself, and with an arbitrary convex body 
P rather than the box [iV] 15 . This more general result can be deduced from 
[28} Theorem 8.6] in a somewhat routine, though slightly tedious, manner. 
We sketch the details in Appendix [Bj The notation C°°([N] ) is recalled 
both in the appendix and later in this section. 

Later on, the notation will get a little complicated. Let us, then, first 
apply Theorem 13.61 to establish the following very simple special case of the 
counting lemma (it is, of course, the special case in which ^ consists of the 
single form tpi(n) = n\). 

Lemma 3.7 (Irrational implies equidistributed). Suppose that (G/r, G.) is 
a filtered nilmanifold of complexity at most M and that g : Z —> G is an 
(A, N) -irrational polynomial sequence. Then we have the equidistribution 
property 

m F(g(n)T) = [ F + O m (A~ cm ||F|| Lip ) 
Jg/v 

for all Lipschitz F : G/T — > C and some cm > 0. 

Proof. Suppose the conclusion is false. Then bvP^ Theorem l3.6l there is some 
continuous homomorphism rj : G — )■ R which vanishes on [G, G] and T, has 
complexity 0<5(1), and for which \\r] o <7||c<oonv| ^ 5~°^ . Recall (cf. |28^ 
Definition 2.7]) what this means: in the Taylor expansion 

r, o g(n) = a + «i (") H ha s (") , 

the jth coefficient ctj satisfies ||aj||]R/z ^ 8~°^> /N^ for j = 1, . . . , s. If the 
sequence g is developed as a Taylor expansion 

g(n) = gog\ ■ ■■9s s ' 
then we of course have ctj = rj(gj). Choose i maximal so that the restriction 
i]\g w is nontrivial. Then certainly Hf^flOHiR/z ^ 5~°^ /N l . We claim that 
n is an i-horizontal character in the sense of Definition IA.5|. a statement 
which will clearly contradict the supposed (A, AQ-irrationality of g if 6 is 
a sufficiently small power of 1/A. To this end all we need do is confirm 
that n vanishes on G(j +1 ), T^ and on [G^,G^_j^] for ^ j ^ i. The 
first of these follows from the maximality of i, whilst the second and third 
follow immediately from the properties of n stated at the beginning of the 
proof. □ 

Let us turn now to the more notationally intensive general case. Now, 
we apply Theorem 13.61 to G^/T* to conclude that there is a non-trivial 
continuous homomorphism n : G* — > R which maps to Z, has complexity 
0,5(1), and satisfies 



12 In fact here we only need the rather simpler 1-parameter version, which is [281 The- 
orem 1.16]. 
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Much as in the proof of Lemma 13.71 what this means is that if r\ o (n) is 
developed as a Taylor series in multi-binomial coefficients (") = ( ni ) . . . ( n . D ) 

(see Lemma [A. 11) . the coefficient a- } satisfies |[atj Hit/z ^8 N~^. Our aim is 
to use this information to contradict the assumption that g{n) is (A, N)- 
irrational. 

Let us once again take i maximal such that r/| G * is nontrivial. Consid- 
er 

ering again the Taylor expansion of g(n), we have 

(r,og*)(n)=J2v(g) 3 V--, 9} 3 ')■ (3.5) 

i=i 

Take the basis vi,V2, - ■ ■ for $W described earlier. Then, since the vector 

((*<«)),...,(*(»))) 

lies in vp^l, there is an expansion 

((*<»>),. . . , (^f))) = P^n)^ + • • • + iWn)^ (3.6) 

for j = 1, . . . , i, where the P,- & : Z-° — > M are polynomials of degree at most 
j, recalling that rrij := dim(\l/"l). Comparing with (13.5p . we obtain 

t m j 

(r ? o/)(„) = ^^P J . fc („)r ? ( 9 f). (3.7) 
3=1 k=l 



We are going to look at the coefficients a\ of (|3.7p for the monomial n 1 := 
n^ 1 . . . , where i = (ii, . . . ,%d) and |i| := \h\ + ■ ■ ■ + \id\ = i- We are 
assuming that every such coefficient satisfies [|cKi ^<5 N~ l . Note also 
that 

rrij 

a i = ^(P J , fc ) i r / («?f fc ), (3.8) 

k=l 

where (Pj,fc)i is the n 1 coefficient of Pj^n); this is because terms of total 
degree i cannot arise from the terms j = 1, — 1 in the sum on the right 
hand side of (|3.7p . 

On the other hand by taking j = i in (|3.6h we have 

(P i> i(n)) i i;i H h (Pi jTOi (n))iV TOj 

' (^(ei)' 1 • • • MeD) iD , ■ ■ ■ , ^t(ei)* 1 • • • ^t(e c )^) 



ii! . . . i D \ 



1 *( ei r---*(e D r, (3.9) 



i D \ 



where ej = (0, . . . , 1, . . . , 0) G Z D , the 1 being in the jth position, and 
*(e i ):=(^i(e i ),...,^(e i ))eR*. 
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Comparing (13.80 and (I3.9P and using the fact that r\ is a homomorphism 
on G , we obtain 



"i = T-: — -Mg, 



*(ei) i l---*(e 23 ) i D, 



«i!...i D ! 1 



Thus, for each i with |i| = + ■ ■ ■ + \id\ = h we have 

m^sN-* (3.10) 



To obtain the desired contradiction with the (A, iV)-irrationality hypothesis 
and thus complete the proof, it suffices (after taking A sufficiently large 
depending on 5) to establish that for at least one choice of i, the map £i : 
G({) — > R defined by 

£i(<?) := V (g^-^y D ) 

is a nontrivial horizontal i-character of complexity 0,5(1). 

The complexity bound follows from the fact that the coefficients of the 
forms ipi are integers of size 0(1) and the Baker-Campbell-Hausdorff for- 
mula (Appendix [U]) . That at least one of these maps is nontrivial follows 
from that fact that 77 is nontrivial on G*^ and the fact that the vectors 

\I/(ei) J1 • • • ^(e/j)* 15 , i\ + • • • + %d = i, span \pW (a consequence of Lemma 

E3D. 

Furthermore £i always annihilates and 0* +1 ^ (by the asserted maxi- 
mally of i). To qualify as an i-horizontal character we must also show that 
it vanishes on [OX^, -J for each ^ j ^ i. To this end, note that we 
may factor 

^(ei)'' 1 ■■■^(e D ) iD = ww', 

where w £ V&^l and w' E Indeed, we may take 

w = \&(ei) jl ■ ■ ■ ^{e D ) jD , w' = *(ei) il_il • • • ^(e D ) iD - jD 

for any indices ji, ■ ■ ■ , jjj with ji ^ ii and Ji + • • • + ju = j, whereupon 
the relevant containments follow from Lemma 13.31 Now if g 6 G*^ and 

g' £ 0*_^ are arbitrary then we have 

[g w ,g ,w '}^[g,gT w '(^od Gf l+1) ) 

by the Baker-Campbell-Hausdorff formula f|C.2|) . Applying r/, which is trivial 
on 0* +1 -j by assumption, we obtain 

m9,g']) = v(lg,gV w ') = v(lg w ,g' w ']) = o, 

the last step being a consequence of the fact that rj has abelian image and 
hence vanishes on [G , G*] . This concludes the proof of the counting lemma, 
Theorem 11.111 
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4. Generalised von Neumann type theorems 

In this section we recall a number of results asserting the connection 
between Gowers norms and various types of linear configuration. These 
results are collectively known in the literature as "generalised von Neumann 
theorems" . The connection between Gowers norms (not called by that name, 
of course) and linear configurations was first made in [15J. A fairly general 
result of this type, which appears in [29J , is the following. 

Theorem 4.1 (Generalised von Neumann Theorem). Let ^ = (tpi, . . . ,ipt) 
be a collection of linear forms ipx, . . . , ipt '■ ^ D ~ * f or some t,D ^ 1, any 
two of which are linearly independent. Then there exists an integer s = s(\P) 
with the property that one has the inequality 

t 

i=l ^*^ m 
for all N ^ 1 and all fx, . . . , f m : [N] — > C bounded in magnitude by 1. 

Remarks. A natural value of s(^>) comes from the proof in [29], which pro- 
ceeds via s applications of the Cauchy-Schwarz inequality. For this reason 
Gowers and Wolf [20J call s(^) the Cauchy-Schwarz complexity of the system 
fy. There is a linear-algebra recipe for computing s(^f) which is not especially 
enlightening but sufficiently simple that we can give it here (see the intro- 
duction to [29] for more details). If 1 ^ i ^ t and s then we say that ^ 
has i-complexity at most s if one can cover the t—1 forms {ipj : j £ [t] \ {i}} 
by s+1 classes, such that does not lie in the linear span of the forms in any 
one of these classes. Then s(^f) is the smallest s for which the system has i- 
complexity at most s for all 1 ^ i ^ t. Note, then, that the Cauchy-Schwarz 
complexity of the system 1 I> = {ni, n\ + ri2, ■ ■ ■ , n\ + (k — l)n2} correspond- 
ing to a A:-term arithmetic progression is k — 2. As a final remark, let us 
note that Theorem 14. 1^ as proved in |29^ Appendix C], is regrettably some- 
what difficult to understand as we had to establish a more general result in 
which the functions fi were bounded by an arbitrary pseudorandom mea- 
sure, and this is notationally heavy. For a gentle explanation of the special 
case \P = {nx,nx + U2,nx + 2n2,ni + 3ri2} (where s = 2) the reader may 
consult |24} Proposition 1.11]. A sketch of the proof of Theorem 14.11 is also 
given in [201 §2]. See also [5] for a variant of these notions of complexity in 
the ergodic setting, and for polynomial forms instead of linear ones. 

We will need a twisted version of the Generalised von Neumann inequality, 
in which an additional nilsequence of lower degree is inserted. We shall not 
need it for general linear forms, so we formulate just the special case we 
need. 

Lemma 4.2 (Twisted generalised von Neumann theorem). Let k ^ 3, let 
/o,...,/fe-i : [N] — > C be bounded in magnitude by 1, let Co,...,Cfc_i be 
distinct integers, and let F(g(n)T) be a degree ^ {k — 2) nilsequence of 
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complexity at most M. Then 

fe-l 

i=0 

Proof. We induct on A;, starting with the case k = 3. The underlying nil- 
manifold G/T is then a torus (M/Z) m with m = Oj\/(l), and <?(n) = 9n + 0Q 
may be taken to be linear. By a standard Fourier decomposition we may 
assume that F(x) = e(£ • x) for some £ G Z m with |£| = 0^(1), in which 
case we may rewrite the estimate to be proven as 

\^ne[N]^de[-N,N]fo(n + c d)f[(n + c x d)f 2 (n + c 2 d)\ < k>M .jnf H/ill^jv], 

where /{(n) = /i(n)e(-(c 2 -ci)" 1 ^-6'n) and/ 2 (n) = /2(ra)e((c 2 -ci)~ 1 ^-6'n). 
However it is easy to establish the invariance properties ||/i||[/2 = ||/i||j/a 
and 1 1/2 1 |t/2 = 1 1/2 and so the result follows immediately from Theorem 

rm 

Now suppose that k ^ 4 and the claim has already been proven for smaller 
k. By permuting indices and then translating n, it suffices to show that 

fe-l 

\^ne[N];de[-N,N] F (9{d)T) Y[fi(n+Cid)\ <k,M,co,-,c k -i ll/fc-illc/*-i[AT] ( 4 - 2 ) 

2=0 

under the assumption that Co = 0. 

Recall from |28| that we define a vertical character to be a continuous 
homomorphism £ : Gn._ 2 \/(Gn._2) H T) — >• R/Z. We say that F has vertical 
frequency £ if one has F(gk-2x) = e{£{gk-2))F{x) for all x G G7T and 
5fc-2 G C(fc_2) • By a standard Fourier decomposition in the vertical direction 
(e.g. by arguing exactly as in [28, Lemma 3.7]) we may assume without loss 
of generality that F has a vertical frequency £. 

Applying the Cauchy-Schwarz inequality, we can bound the left-hand side 
of (tt} by 

fc-i 

< |E ne[ 7v ];/l , dehiV ^]F( 5 (d+/i)r)F((7(d)r) n f^n+ad+aVfiin + c^)! 1 / 2 . 

i=0 

Because F has a vertical frequency, F(g(d+h))F(g(d)F) is a degree ^ (k — 3) 
nilsequence of complexity Om^(1) (see [281 Proposition 7.2]). Applying the 
induction hypothesis, we may thus bound the above expression by 

< ^M,k,c ,...,c k _ 1 (K/ie[-A r ,Af]ll^c i h/j|| 2 /fc-2[7v]) 1 ^ 2 

which by Holder's inequality can be bounded by 

^JW ) fc,co,...,Cfc_i ( E /ie[-|c i |Af,|c 4 |Af]ll^/t/i|l^fe-2[Ar]) 1 ^ 2 
and the claim follows from the recursive definition of the Gowers norms. □ 
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Remark. The above argument is very similar to the short proof presented 
in |3H Appendix G] that s-step nilsequences obstruct uniformity in the U s+l - 
norm (that is, the inverse conjecture GI(s) is an if-and-only if statement). 

5. On a conjecture of Bergelson, Host, and Kra 

We now apply the arithmetic regularity and counting lemmas to establish 
Theorem 11.121 the proof of the conjecture of Bergelson, Host and Kra. It 
will suffice to prove the following claim. 

Theorem 5.1. Let k = 1, 2, 3 or 4, and suppose that < a < 1 and e > 0. 

Then for any N ^ 1 and any subset A C [N] of density \A\ ^ aN , one can 
find a function fj, : Z — > M + such that 

E de[ _ NiN]f i(d) = l + 0(e) (5.1) 

and 

sup n{d) < Q>£ 1 (5.2) 

de{-N,N] 

such that 

E ne[N] . d€[ _ N:N] l A (n)l A (n + d)... l A (n + (k - l)d)(x(d) > a k - 0{e). (5.3) 

Indeed, from (|5.1j) . (I5.3p . we see that we have 

E ne[N] l A (n)l A (n + d)... l A (n + (k - l)d) ^ a k - 0(e) 

for all d in a subset E of [— N, N] with ^d€[-N,N]^E(d)fJ-(d) 3> a , e 1. From 
(|5.2p we conclude that \E\ ~^> a ,e N, and Theorem 11.121 follows (after shrink- 
ing e by an absolute constant). Conversely, it is not difficult to deduce 
Theorem 1 1 . 1 2 1 from Theorem 15.11 

It remains to establish Theorem 15.11 We may assume that N is large 
depending on a, e as the claim is trivial otherwise (just take [x to be the 
Kronecker delta function at 0). 

For k = 1 one can simply take /U = 1. For k = 2, we first observe that 

^ne[N]^he[-eN,eN]lA(.n + h) = a + 0{e); 

applying Cauchy-Schwarz we conclude that 

^h,h>e{-eN,eN]^ne[N]lA(n + h)l A (n + hi) ^ a 2 - 0(e). 

The claim then follows, with /x being the probability density function of 
h — hi as h, hi range uniformly in [— eN, eN]. 

Now we turn to the cases k = 3, 4. Let J- : M + —> R + be a sufficiently 
rapidly growing function depending on a, e in a manner to be specified later. 
We apply Theorem 11.21 with s := k — 2 to obtain a quantity M = O e ^(l) 
and a decomposition 

U W = /nilfa) + /sml(rc) + /unf(n) (5.4) 

such that 

(i) fmi(n) is a (.F(M), iV)-irrational degree ^ k — 2 virtual nilsequence 
of complexity at most M and scale N; 
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(ii) / sm i has an L 2 [N] norm of at most e/100; 

(hi) /unf has an U k ~ 1 [N] norm of at most 1/F(M); 

(iv) /nil, /smi, /unf are all bounded in magnitude by 1; and 

(v) / ni i and /nil + /smi are non-negative. 

It is clear that |Engnvi/smi(w)| = 0(e), and furthermore, by Theorem 14.11 
(setting all but one of the functions equal to 1) we also have |E ng [ A r]/ un f (n)| = 
0(e) if T grows rapidly enough. Therefore 

E„ G [AT]/nii(n) > a - 0(e). (5.5) 

The heart of the matter is the following proposition. 

Proposition 5.2 (Bergelson-Host-Kra for / n a). Let k = 3,4. Then there 
exists a non-negative (k — 2) -step nilsequence fi : Z — ^ M + of complexity 
Qj£i m(1) obeying the normalisation 

E de[N]f M(d) = l + 0{e) (5.6) 

and such that 

E Me[J v]/nil(n)/nil(n + d) . . . / ni i(n + {k- l)d)M«Q > " O(e). (5.7) 

Deduction of Theorem \5. 1\ from Proposition 15. £1 Using (|5.4p . one can ex- 
pand the left-hand side of (j5.3j) into 3 k terms, one of which is (|5.7|) . As 
for the other terms, any term involving at least one copy of / un f is of size 
O a ,£,M{l/J~(M)) by Lemma [4. 21 and the U k ~ 1 norm bound on / un f. Finally, 
consider a term that involves at least one copy of / sm i. Suppose first that 
we have a term that involves f sm \(n). Then after performing the average 
in d using (15. 6|) . we see that this term is 0(E ng [7v]|/ sm i(n)|), which is 0(e) 
by the L 2 [N] bound on / sm i and the Cauchy-Schwarz inequality. Similarly 
for any term that involves / sm i(^ + id), after making a change of variables 
(n',d) := (n + id,d). Putting all this together we obtain the result. □ 

It remains, of course, to establish Proposition 15.21 We may assume that 
iV is sufficiently large depending on a, e, M, as the claim is trivial otherwise 
by taking /x to be a delta function. 

We first establish the proposition in the easier of the two cases, namely 
the case k = 3. This was previously considered in [23]. In this case it is 
actually easier to work with the (easier) weak regularity lemma, Proposition 
12.71 hi which the degree 1 polynomial sequence g(n) is not required to be 
irrational. Note that we have not made any use of irrationality so far, though 
we shall do so later when discussing the case k = 4. We may identify G/T 
with (M/Z) m for some m = Ojvf(l) and, by modulating F if necessary, we 
may suppose that g(n) = On is linear with no constant term, where 9 G K m . 
Then 

/nii(n) = F(nd), 
where F : (M/Z) m — > C has Lipschitz norm 0^(1). 
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Let e' > be a small number depending on e and M to be chosen later, 
and let B\,B2 Q [-N, N] denote be the two Bohr sets 

B 1 := {d £ [-e'N,e'N] : dist (M/z) ™ {Od, 0) sC e'} 

and 

B 2 := {d E [-e'N,e'N] : dist (K/z)m {6d, 0) ^ e'/2}. 

By the usual Dirichlet pigeonhole argument we see that \B 2 \ 3> £ ',m N. Also, 
from the Lipschitz nature of F, we see that 

fmi(n + d) = f nil (n) + M (e') 

whenever d £ B\ and n £ [—(1 — e')N, (1 — e')iV]. As a consequence, it 
follows that 

IE n6 [jV]/nil(™)/nii(rc + d )fnii(n + 2d) = E„ eN / nil (n) 3 + A /(e') 
for such d. However from (|5.5p and Holder's inequality one has 

EneNinii(n) 3 ^ a 3 - 0(e). 

Proposition 15.21 (in the case k = 3) now follows by taking /x(d) = cip(9d), 
where ?/> : (R/Z) m — > [0,1] is an Ojy/,e'(l)"Lipschitz function which is 1 on 
i?2 and outside B\, c = Om,c'(^) is a suitable normalisation constant, and 
by taking e 1 to be suitably small. 

We now turn to the k = 4 case of Proposition 15.21 For simplicity let us 
first consider the model case when / n ;i is a genuine nilsequence and not just 
a virtual nilsequence, that is to say 

fmi(n) = F(g(n)T) (5.8) 

where (G/T,G,) is a degree ^ 2 filtered nilmanifold of complexity Om(1), 
and g G poly(Z,G,) is (J-(M), iV)-irrational. By Taylor expansion (see 
Appendix El), we have 

( n ) 

g(n) = 909192 

for some go,gi £ G and gi £ G(2)- The (F(M), A r )-irrationality of g ensures 
certain irrationality properties on g\ and g2 , though we will not need these 
properties explicitly here, as we will only be using them through the counting 
lemma (Theorem II. lip , which we shall be using as a black box. 

Let 7r : G — > T\ be the projection homomorphism to the toruf 3 ! T := 
G/{G m Y). Then 

n(g(n)) = 7r(g )^(gi) n - 
Let e' > be a small quantity depending on e, M to be chosen later. We set 

H(d) := cl[_ £ , Nt£ , N] {d)(l)(ir(gi) d ), 

where, much as in the analysis of the case k = 3, (ft : T\ — > M + is a smooth 
non-negative cutoff to the ball of radius e' centered at the origin that is 

1 3 

Note this is not quite the same thing as the horizontal torus, which is so important 
in [28], which is (G/r) ab := G/[G, G]V. 
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not identically zero, and c is a normalisation constant to be chosen shortly. 
From Theorem 1 1 . 1 1 1 one has 



^de[-e'N,e 

Thus if we set 



'Ti 

1 



N] <t>{n{9l ) d ) = 4> + Oj'(M)^oo; £ ',Af( 1 ) + OA^oo; £ ',m(1)- 

■ • r -0 £ ',m(1) (5.9) 

then we have the normalisation (15. 6ft . if J 7 is sufficiently rapid, depending on 
the way in which e' depends on e, M, and N is sufficiently large depending 
on e, e' , M. From the bound on c we see that \x is a degree ^ 1 (and hence 
also degree ^ 2) nilsequence of complexity £ ',a/(1)- 

We now apply the counting lemma, Theorem II .111 to conclude that 

E n,dg[7V]/nil(^)/ml(ra + d)/nil(™ + 2d )fnil{ n + 3d)/i(d) 

= / ^ + °J-(M)-*-oo;e',iW"(l) + °JV->oo;e',Af(l) 

(5.10) 

where G* C G 4 is the Leibman group associated to the collection ^ = 
(ipo, ip2, ^3) : — )• Z 4 of linear forms ^j(n) := rti + iri2, i = 0,1,2,3, 
that is to say the Hall-Petresco group HP 4 (G), and F : G* -> C is the 
function 

F(ac 0) a?i,a?2 ) a?3) : = c0(vr(xi)7r(x o ) _1 )F(xo)i ? (xi)F(x2)F(x 3 ) 

(here we use the identity ir(g(n+d))~ 1 TT(g(n)) = ir(gi) d , immediately verified 
from the Taylor expansion). 

We now do some calculations in the Hall-Petrseco group very similar to 
those in [4j. We saw in $3] that 

G * = {(go, gogi, gogfg2, gogig%) ■ go,gi eG,g 2 e G (2 )} 

(note, of course, that Gr 3 \ = id in the case we are considering). For our 
calculations it is convenient to use the following obviously equivalent repre- 
sentation: 

= {(5052,0,905192,1)509192,2,505152,3) : 9o,9i G G\ 

52,0, • • • ,52,3 G G \ 2 y, 92,092,192,292,1 = id i- 

Here we have taken note of the fact that 

\J/[ 2 1 = {(x ,x 1 ,X2,x- i ) £l 4 :io - 3xi + 3x 3 — x 3 = 0}. 

This last equation is quite special in that it exhibits a certain "positivity" , 
as we shall see later; this is key to our argument. The lattice T* can be 
similarly described by requiring 50,51,52,0, •• • ,52,3 to also lie in T. As a 
consequence of this, an arbitrary point of the nilmanifold G^/r* can be 
parameterised uniquely as 

(5052,0, 5o5i52,i, 5o5i 52,2, 5o5i52,3)r* (5.11) 
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where go,gi he in a fundamental domain Si C G of the horizontal torus T\ 
(i.e. a smooth manifold with boundary on which tt is a bijection from Si to 
Ti), and g 2 $, • • • ,92,3 he in a fundamental domain S2 C Gr 2 \ of the vertical 
torus T 2 := G(2)/r( 2 ) subject to the constraint g2flg 2 ig 22 g 2 \ e -^(2)- F° r 
such a point (|5.1ip . the function F takes the value 

3 

cH7r( gi ))l[F(g ffig 2 jT). 

3=0 

On the support of 4>, g\ is a distance Om{^') from the identity (if the fun- 
damental domain Si was chosen in a suitably smooth fashion), and so by 
the Lipschitz nature of F and the boundedness of go we have 

F{gog{92, 3 ) = F{g Q g 2d T) + M (e'). 

As a consequence, the integral J G *y r * F can be expressed as 

3 

c [ [ HA9i)){ [ g20 ,..., g23& T 2 tlFi9092jT) + O M (ef)) (5.12) 

where all integrals are with respect to Haar measure. 

Let £ £ T 2 be a vertical character, i.e. a continuous homomorphism from 
T2 to E/Z. For any x £ G/T, we can define the vertical Fourier transform 
F(x,£) to be the quantity 



F(x,0 := [ e{-i{g 2 ))F{g 2 x). 



I92&T2 

From the Fourier inversion formula we have 

3 



/ 9*,o,.,9^T 2 flFisognjT) = ^2 \F(g ,O\ 2 \F(g ,3O\ 2 - 

92,03^?9l, 2 9^3=idi=0 £ef 2 

In particular, we havd^i 

f 3 

/ v, ^ r U F ^ T )> \ p <*> °)| 4 " 

92,09 2 ,?32,292,3= id -J- 

Inserting this bound and (|5.9p into (|5.12p . we conclude that 

/ F> ( |F(5oF,0)| 4 - M {e') - o W ^oo; £ ',m(1)- 
Jc*/r* -'goes! 

From Fubini's theorem we have 



/ F( 5o r,0) = / F 



14 This is the "positivity" alluded to earlier. The argument is essentially that used in 
[2] and it is special to the k = 4 case, which is of course consistent with the failure of 
Theorem 15. II to extend to k ^ 5. 



ARITHMETIC REGULARITY AND COUNTING LEMMAS 35 

and from Theorem 1 (15. 8p and (15.50 we have 

/ F = a + 0(e) + ojr {M )^ 00;e / iM (l) + o A r_ >00;£ ' 5M (l)- 
Jg/t 

Applying Holder's inequality, we conclude that 

/ F ^ a 4 - 0(e) - M {e') - omw.mW ~ ojv-^oQ;e',Af(l), 
JG*/r* 

and so (15. 7h follows from (I5.10p . if e' is sufficiently small depending on e, M, 
J 7 is sufficiently rapid depending on e, and N is sufficiently large depending 
on e',M. 

This concludes the proof of the k = 4 case of Proposition 15.21 in the special 
case when / n n(n) = F(g(n)T) with g irrational. Unfortunately Theorem 
11.21 requires us to deal with the somewhat more general setting of virtual 
nilsequences, in which there is dependence on n mod q or n/N. The extra 
details required are fairly routine but notationally irritating. Let us now 
suppose, then, that 

/nil(n) = F{g(n)F, n mod q, n/N). (5.13) 
We let e' be as before, but modify fi to now be given by 
fi(d) := ql q \ d cl[- £ > N) e> N ]{d)(t){'K{gi) d ), 
with c still chosen by (I5.9p . As before, one can use Theorem [LTTJ to establish 

HEED. 

Now consider the left-hand side of the expression (15 .7p we are to bound 
in Proposition 15. 2| that is to say 

E n ,de[iv]/ml(n)/nii(n + d)f nil (n + 2d)f nil (n + 3d)fi(d). (5.14) 

Splitting into residue classes modulo q, we can express this as 

3 

C^re[q] E ne[N/q]^de[-e'N/q,e'N/q] Y[ ^(ffC?" + 1 id + r ) T i r ' 

i=0 

g (n + ir)/iV)0(7r( ffl ) 9d ) + O J v ->tx3;e',Af(l)- 

We partition [iV/g] into intervals P of length [e'N\ (plus a remainder of 
cardinality 0(e'N)). We can then rewrite the above expression as 

3 

cEpE re[q] M ne pE de[ _ e , N/q)£ , N/q] Y[ F{g{qn + qid + r)T, r, 

i=0 

q(n + zr)/iV)0(7r( ffl )^) + O(e') + 0^oc ;£ ',m(1). 

For each such expression, we can use the Lipschitz nature of F to replace 
q(n + ir)/N by qnp/N, where np is an arbitrary element of P, losing only 
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an error of Om(e')- The above expression thus becomes 

3 

cE P E re[q] E neP E de[ _ £/N/qt£/N/q] Y[ F(g{qn + qid + r)T, r, qn P /N)cf)(TT(gi) qd ) 

+ O m {e') + O n (1)- 

Because the orbit n i-> g(n)r is (T(M), AQ-irrational, we see from Lemma 
IA.8I that shifted translate n h-> g(g(n + n P ) + r)r is (> A ./ T(M),N)- 
irrational. We may then argue as in the previous case and bound the above 
average below by 

^ E P E r6[3] | / F(;r,qn P /N)\* - 0(e) - M (e') 
Jg/t 

- °J-(M)-+oo;e',M(l) ~ OjV->oo;e',M(l)- 

Using Theorem 11.111 again . we have 

E neP f ni i(qn + r) = / F(-,r, qn P /N) + Ojr (M) _, 00 . £ , )A/ (l) + o;v->oo;e',M(l) 
JG/r 

and so (|5.14p is at least 

> Epl^eMlEnep/naCgn + r)| 4 - 0(e) - M {e') 

~ J-(Af)^oo;e',M(l) ~ OjV-)-oo;e',A/ (1) • 

Now from (|5.5)) and double-counting one has 

EpE r6 [g]E„ e p/ ni i(grt + r) = a + O(e) 
and so, from Holder's inequality, we deduce that (|5.14p is 

^ a 4 - 0(e) - Om(z') ~ 0.F(M)-*»;e',Af(l) ~ iV-Kx>;e',Af(l)- 

Proposition 15.21 now follows by once again choosing e' small enough depend- 
ing on e, M, and choosing T rapid enough depending on e, and N sufficiently 
large depending on e, e' , M. 

6. Proof of Szemeredi's theorem 

We turn now to the proof of Szemeredi's theorem. We deemed this result 
too famous to state in the introduction but, for the sake of fixing notation, 
we recall it here now. It is most natural to establish what might be called 
the "functional" form of the theorem which is a priori a stronger statement 
(though quite easily shown to be equivalent to the standard formulation by 
an argument of Varnavides [S3] ) . 

Theorem 6.1 (Szemeredi's theorem). Let < a ^ 1, let k ^ 3, and let 

N ^ 1. If f : [N] — > [0, 1] is a function with E n6 nyi/(n) ^ a then 

A fc (/, /,...,/) > fc , a 1, 

where 

Afc(/i, • • • , fk) ■= ^ne[N];de[-N,N]fi(n)f2(n + d) . . . f k (n + (k - l)d) 
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is the multilinear operator counting arithmetic progressions. 

We now prove this theorem. We fix k, a, and allow implied constants to 
depend on these quantities. 

As usual, we begin by applying the regularity lemma, Theorem 11.21 In 
view of the generalised von Neumann theorem, Theorem 14.11 it is natural to 
apply this theorem with s = k — 2 (which, as remarked in $4j is the Cauchy- 
Schwarz complexity s = s(^f) of the system ^ of linear forms Wi,m + 
ri2, ■ ■ ■ ,ni + (k — 1)^2). If we do so, with a small parameter e > depending 
on a, k to be chosen later, and a growth function T depending on a, k, e to 
be specified later, we obtain a decomposition 



(i) /nil is a (F(M) , iV)-irrational degree ^ k — 2 virtual nilsequence of 
complexity ^ M and scale N; 

(ii) / snu has an L 2 [N] norm of at most e; 

111 / unf has an U^N] norm of at most 1/J-(M); 

(iv) /nil, /smi) /unf are all bounded in magnitude by 1; and 

(v) /nil and / ni i + / sm i are non-negative. 

As we shall soon see, the contribution of / un f can be quickly discarded 
using the generalised von Neumann theorem. If one could also easily discard 
the contribution of the small term / sm i, then matters would simply reduce 
to verifying that the contribution of f n \\ is bounded away from zero, which 
would be an easy consequence of the counting lemma. Unfortunately the 
small term f sra y is only moderately small (of size 0{e)) rather than incredibly 
small (e.g. of size 0(1/ J- (M))), and so one has to take a certain amount of 
care in dealing with this term, which makes the analysis significantly more 
delicatcQ 

We turn to the details. Much as the key to proving Theorem 11.121 was to 
establish Proposition 15.21 the key to establishing Szemeredi's theorem is the 
following proposition. 

Proposition 6.2 (Szemeredi for / n a). Let f n n be as above, and let e > 0. 
Then there exists a function /i : Z x Z — > M + supported on the set 

{(n, d) e Z x Z : d G [-eJV, eN];n + id€ [N] for all i = 0, . . . ,k - 1} (6.2) 



f(n) = /nii(n) + f sm \(n) + / unf (n) 



(6.1) 



where 



with 



E ne{N];d£hsN,eN]K n i d ) = 1 + °(e) 

and with [i bounded in magnitude by Om, £ (1), such that 

/nu(n + id) = /nii(n) + 0(e) 



(6.3) 



(6.4) 



In the language of ergodic theory, the problem here is that the characteristic factor 
is not necessarily a nilsystem, but may merely be a pro-nilsystem - an inverse limit of 
nilsystems. 
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whenever ^ i ^ k — 1 and fi(n, d) / 0, and such that one has the equidis- 
tribution property 

^ne[N]\^de[-eN,eN]K n ~ id ,d)\ 2 = 1 + 0(e) (6.5) 

for allO s^i^k - 1. 

The crucial feature of Proposition 16.21 for us is that, with the exception 
of the uniform bound on fi, the error terms here decay as e — > 0, even if the 
complexity bound M on / n n is extremely large compared to 1/e. 

The reader may benefit from a few words about the role of the function 
fi. Supposing that f n n(n) = F(g(n)T) is a genuine nilsequence, this function 
acts like a kind of "weight" on progressions (n, n + d, . . . , n + {k — l)d) which 
are "almost diagonal" in the sense that g(n)T ~ • • • ~ g(n + (k — l)d)T 
in G/T. The condition (|6.5p reflects the fact that the weighted number of 
almost diagonal progressions whose ith point is n is roughly independent 
of n. This "non-concentration" of almost diagonal progressions ultimately 
means that the error / sm i cannot destroy too many of these progressions, a 
fact that is crucial for our argument. 

Let us assume Proposition 16.21 for now and see how it implies Theorem 
16.11 We use f)6. II) to expand out the form (/,... , /) into 2> k terms. By 
Theorem 14.11 any term that involves / un f will be of size 0(l/J r (M)), thus 

A fe (/, ...,/)= Afc(/ ni l + / sm l, . . . , /nil + /sml) + 0{l/F{M)). (6.6) 

Next, we use the weight fi arising from Proposition 16.21 and the non- negativ- 
ity of / n ii + / sm i guaranteed by Theorem 11.21 to write 

Afc(/nil + /smli • • • j /nil + /sml) 

>M,e ^ne[N];de[-eN, £ N](f nil + /srnlX™) • • • (/nil + /sml)(" + (k - l)d)n(n, d). 

We then expand this latter average into the sum of 2 k terms. The main 
term is 

^ne[N];de[-eN,eN]fnil{n) . . . / n ii(n + (k - l)<£)(i(n, d), (6.7) 
and the other terms are error terms, involving at least one factor of / sm i- 

Consider one of the error terms, involving the factor f sm \(n + id) (say) for 
some ^ i ^ k — 1. We can bound the contribution of this term by 

K ne[N];de[-eN,eN]\fsm\(n + id)\fi(n, d), 
which by a change of variables n^n — idwe can write as 
E„ e [Ar]|/sml(n)|E dg [_ £A r i£A T]^(n - id, d). 

By Cauchy-Schwarz, (j6.5j) . and the L 2 [N] bound on / sm i, this is 0(e). 
Finally, we look at the main term ()6.7p . Using (|6.4p we can approximate 

/nii(n) . . . f n n(n + (k- l)d) = f nil (n) k + 0(e) 

and so (using (|6.3p ) we can write (|6.7j) as 

^ne[N]fmi{n) k E d£[ _ ENtEN] [i(n, d) + 0{e). 
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Now, from (I6.3D one has 

^n£[N]^de{-eN,eN]V(n,d) = 1 + 0{e) 

and hence by (|6.5p 

^n£[N}\^de{-eN,eN]K n ,d) ~ 1| 2 = 0(e). 

In particular, by Chebyshev's inequality, we have 

®de[-sN,sN]Kn,d) = l + 0(e 1/3 ) 

for all n G E, where E C [AT] has cardinality |E| > (1 - 0(e 1/3 ))iV. Thus, 
for e small enough, we can bound (|6.7p from below by 

»E ne[w] l £ (n)/ nil (n) fc -0(e 1 / 3 ). 

Now from hypothesis we have E ne n\n/(n) S> 1. From Cauchy-Schwarz we 
have 

^ne[N]fsmi(n) = 0(e), 
and from Theorem 14. II we also have 

Ene[AT]/unf(«) = 0(e) 

if J 7 is rapid enough. Thus if e is small enough we have ^ n £[N]fm\(n) S> 1, 
which implies that E n g[jv]l.E( n )/nii( ri ) ^ 1> an d hence by Holder's inequality 
that E n6 nv] Is (?"!<) /^(t^) 3> 1- Putting all this together, we conclude that 
(|6.7p is 3> 1 if e is small enough, and thus 

A-fc (/nil + /sml, • • • 5 /nil + /sml) >Af,e 1. 

Inserting this bound into (j6.6|) we obtain the claim, completing the proof of 
Szemeredi's theorem, if T is chosen sufficiently rapid. 

Proof of Proposition \6.2l Let us first establish this in the easy case k = 3. 
In this case, f a \\ is essentially quasiperiodic, which will allow us to take 
/j,(n, d) to be of the form 

fi(n,d) = l[2eN,(l-2e)N](n)Kd) 
with n(d) normalised by requiring 

^de[-sN,eN]K d ) = 1 + °( e )- 

It is then easy to verify that both (|6.3p and (|6.5p follow from this normal- 
isation. To establish the remaining claims in Proposition 16. 2\ we use the 
degree ^ 1 nature of the orbit n \-± g(n)T as in Section [5] to write / ni i as 

f nil (n) = F[nO) 

for some 9 G (R/Z) D with D = O m (1) and some F : (R/Z) D -> C of 
Lipschitz constant Ojvf(l)- If one then sets /U to equal 
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where B is the Bohr set 

{de[-sN,eN):d im)D (d9,0)^5} 

and 5 > is sufficiently small depending on e, M, one easily verifies all the 
required claims. 

We now turn to the case k > 3, which is harder because / n ;i is no longer 
quasiperiodic, and so (J,(n, d) will have to depend more heavily on n and not 
just on d. By arguing as in the previous section we can normalise #(0) to 
equal id. We may also assume iV is sufficiently large depending on e, M, 
since otherwise we may simply take fi(n,d) = lrjv] {n)$o(,d) where 5o is the 
Kronecker delta function at 0. We may of course also assume that e < 1. 

We take an (^-rational Mal'cev basis Xi, . . . ,^Qim(G) f° r the Lie al- 
gebra q = log G adapted to the filtration G, as described in [281 Appendix 
A]. For any radius r > 0, we define the "ball" B r in G to be the set of all 
group elements of the form 

dim(G) 

exp( tjXj) (6.8) 
i=i 

where the tj are real numbers with tj ^ r^ 1 - 1 whenever 1 ^ i ^ s and j 
dim(G)— dim(G(j)). Thus, when r is small, B r is quite "narrow" (of diameter 
comparable to r s ) when projected down to G/Gm, but is relatively large 
when restricted to the top order component G/ s \ (of diameter comparable to 
r). This type of eccentricity is necessary in order to make B r approximately 
"normal" with respect to conjugations. Indeed, we have 

Lemma 6.3 (Approximate normality). Let A, 5 > 0, and let g S G be such 
that do(g,id) ^ A. Then we have the containments 

B(i- S )r ^ gB r g~ l C B {1+5)r . (6.9) 
whenever r > is sufficiently small depending on A,6,M. 

Proof. We prove the second inclusion only, as the first is similar (and can 
also be deduced from the second). The conjugation action h h-> ghg^ 1 on G 
induces a Lie algebra automorphism exp(ad(log g)) : g — > g. If we conjugate 
the group element (16. 8p by g, we thus obtain 

dim(G) 

ex P( ^2 tjexp(&d(logg))(Xj)). 

3=1 

But if 1 ^ i ^ s and j ?J dim(G) — dim(G^)), we see from the Baker- 
Campbell-Hausdorff formula ()C.2p that 

dim(G) 

exp(ad(logg))(Xj) = Xj + ^ c j,j'X? 

i'=dim(G)-dim(G (l) )+l 
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for some coefficients Cjj> of size Oa,m(i~ s+1 i ). Collecting all the coefficients 
together, we obtain the claim for r small enough. □ 

Let < 5 < 1/10 be a small quantity (depending on e,M), let R be a 
large quantity depending on the same parameters, and let ro > be an even 
smalleiEl quantity than 5 (depending on e, M, 5, R) to be chosen later. For 
each r with < r < ro take a Lipschitz function <j) r : G — > M + of Lipschitz 
norm OM,r,sQ) which is supported on B r and equals one on Bngy, and 
choose these functions so that <p r ^ <f>' r pointwise whenever < r < r' < ro. 
For each such r, let & r : G/T x G/T — > R + be the induced function 

<S> r (x,x'):= Ma)- 

gdG:gx=x' 

This function <3? r is supported near the diagonal of G/T x G/T; indeed, 
<3? r (x,x') is only non-zero when x' £ B r x, and furthermore if x' € Bng\ r x 
then $ r (x,x') = 1. If ro is chosen sufficiently small depending on Af, 5, we 
conclude from Lemma 16.31 that we have the approximate shift-invariance 

<S> {1 _ 3S)r (x,x') ^ $ r {gx,gx') ^ $ (1+35)r (x, x') (6.10) 

whenever x,x' 6 G/T and g £ G is such that do(g, id) ^ R (say). 
We now define our cutoff function fi = fi r by 

fe-i 

Hr(n,d) := Crlqidl^N^k^in^sNjN^d) J| $ r (5(n)r,5(n + id)r), 

i=l 

(6.11) 

where c r > is a normalisation constant to be chosen later. This function, 
as discussed immediately following the statement of Proposition 16.21 is a 
smooth cutoff to the set of "almost-diagonal" progressions in G/T. Specif- 
ically, fj, r is supported in (|6.2|) . and also in the region where gin + id)T € 
B r g(n)T, \d\ ^ 5N, and q\d for i = 0, . . . , k — 1. From the Lipschitz nature 
of F we thus have 

F(g(n + z'd)r, (n + id)(mod g),(n + id)/iV) 

= F( 5 (n)r,n(mod g), n/iV) + O m (r ) 

for (n,(i) in the support of u r , which gives (16. 4j) for fi r if ro is sufficiently 
small depending on e, M. 



Readers may find it helpful to keep the hierarchy of scales 

1 ~ 1/fe, a > e > 1/M > 5 > 1/7? > r > r > l/^M) » 1/7V > 

in mind. 
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Next, we compute the expectation of fj, r (n, d), in order to work out what 
the normalisation constant c r should be. Observe that 

^n£[N],de[-eN,eN]^r(n, d) 

= — (1 + 0(e))c r x (6.12) 
qe 

x ^ne[ksN,(l-ks)N];de[SN,8N];q\d^r(g(n)T, . . . , g(n + (k - l)d)T), 
where & r : (G/T) h ->■ R + is the function 

fc-l 

$ r (x , • ■ ■ , Xd-l) ■= Y\ ®r{x , Xi). (6.13) 

1=1 



Observe that $ has a Lipschitz norm of Oj^ tr s(l). Applying Theorem II. 11| 
we can express ()6. 12|) as 

— (1 + 0(e))c r ( [ <l r + 0j:f M )^ O0 . MrS (l) + OjV->oo;M,r,<s(l)), 

qe JG*/r* 

where C G k is the fc th Hall-Petresco group, that is to say the Leibman 
group associated to the collection \I/ = (ipo, . . . , ipk-i) of linear forms := 
(n, d) i— )• n + id for « = 0, . . . , k — 1. 

The group is a Om (Irrational subgroup of G fc , which itself has com- 
plexity Om(1). Meanwhile, the function & r equals 1 on a ball of radius 
r o M {i) centred at the identity, and is bounded by 1 throughout. We con- 
clude that the quantity 

V r := / $r 

JG*/r* 

obeys the bounds 

r O M (l) <M y r ^ I 

Furthermore, from the properties of the functions <j) r , we have the mono- 
tonicity property 

V {1 _ 5 )r ^ V r 

for any < r < tq. Applying the pigeonhole principle (using the fact that 
polynomial growth is always slower than exponential growth), and choosing 
8 ^e,M 1 sufficiently small depending on e, M, one can thus find a radius 

r > r > ro , £ ,5,M 1 

such that we have the regularity property 

(1 - 0{e))v r s; u (1 _ M)p < v {1+3S)r < (1 + 0{e))v r . (6.14) 

Note that this idea of picking a "regular" radius originates, in additive com- 
binatorics, in Bourgain's paper [TJ. Fix from now on a value of r with this 
property. If we then set 
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we conclude that 

Cr «M,r ,£ 1 (6.16) 

and 

^ne[N],de[-sN,sN]Vr(n, d) = 1 + 0(e) + Ojr( Af )_ >oo . Afj£!r0 (l) + OAr^oo ; Af,£,ro(l)- 

This will give (|6,3p provided that ro is chosen to depend on M, e, 8, that T 
is sufficiently rapid depending on e, and N is sufficiently large depending on 
M, e. 

Our remaining task, and the most difficult one, is to study the expression 
in (|6.5p . That is to say, wefixO^i^/c — 1 and consider 

^ne[N]\^d£[-eN,eN]Vr(n ~ id,d)\ 2 . (6.17) 

Using (|6.1ip . we can write this expression as 

(1 + 0(e))(— c r ) 2 E n€ [ k£N ^ 1 _ k£ ^ N ^E d:d i e [_ SNtSN y^ did , 

$f 2 (g(n - id)F, . . . , g(n + (k - 1 - i)d)T, 

g(n — id')F, . . . , g(n + (A; — 1 — i)d')T) 

where §f 2 : (G/T) k x (G/T) k ->■ R + is the tensor square 

$>f 2 (x,x') := $ r (x)$ r (a/). 
Applying Theorem we can thus express (I6.17P as 

(l + 0(e))( — C r ) 2 ( ^f 2 +0^(A/Hoo; £ ,M,ro(l)+0^oo; e ,M,ro(l)) 

(6-18) 

where G* 1 C G 2fc is the Leibman group associated to the collection 
* W : = (^0,i, • • • j lpk-l,i, lp'o,i, 

of linear forms 

ipj t i : (n, d, d) t-t n + (j — i)d 

and 

if/ jti : (n,d,d') i-> n + (j - i)tf 

for j = 0, . . . , A; — 1. 

We will be establishing the following claim. 

Claim 6.4 (Approximate factorisation). We have 

$f = (1 + 0(s))v 2 r . (6.19) 

Proof of Proposition \6.2\ assuming Claim \6.4\ Substitute back into (I6.18P 
and use (|6.15|) . (|6.16|) to conclude that 

Q6.17J = 1 + 0(e) + Oj:(M)-> 00 . et M,roQ) + ON^oo;e,M,r (l)- 
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This gives the result upon choosing tq sufficiently small depending on e, M, 5, 
T sufficiently rapid depending on e, and N sufficiently large depending on 
e,M. 

It remains to establish Claim 16. 41 For notational simplicity we estab- 
lish only the claim i = (the others being very similar). The intuition 
behind this claim (and behind the key assertion that the number of almost- 
diagonal progressions whose i th term is n does not depend on n) is that the 
linear forms ("00,0) • • • , ipk-i,o) and (V^O' • • • > o) are almost independent 
of each other, except for the fact that they are coupled via the obvious 
identity ^o,o = V^o- 

One way to encode this formally is to note that the Leibman group Cr* ° 3 
is given by 

H := {(x, x') £G*xG*:i = x' }, 

a product of two copies of the Hall-Petresco group = HP fe (G) fibred over 
the zeroth coordinate. To prove this, one may note that the containment 
G* <0) C H is obvious. On the other hand, one may compute directly using 
the dimension formula (|3.ip that 

k-2 

dim(C7*) = dim(G) + ^dim(G^) 

i=l 

and 

k-2 



dim(G* C } ) = dim(G) + 2 ^ dim(G« 



i=l 

and thus 



dim(G* 1 ) = 2dim(G*) - dim(G) = dim(fT), 
and so since both sides are connected, simply-connected nilpotent Lie groups 
(and so both are homeomorphic to their Lie algebras) we have G* (0) = H. 
Write J r for the integral appearing in (16.19|) . that is to say 



J r := / *f\x,x'). 
j(x,x')eG*/r*xG*/r*:x =x[, 

Let R be some quantity, and suppose that distaff, id) ^ R. Then by the 
almost-invariance property (|6.10p we have 



/ 

J(x 



I (x,x')£G* /T* xG* /T* :x =gx> V ' 

Integrate this over the ball Br := {g £ G : distc<(<7, id) ^ R}. Then we 
obtain 

/ \(x,x')$f ( 2 Jx,x') > vol(B R )J r , 

where X(x,x') is the number of g £ Br for which xo = gx' (mod T), or 
equivalently 

X(x,x') : = |T n Xq 1 Brx'q\. 
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Choose representatives xq,x' in some fundamental domain with xq,x' = 
Oa/(1)- By a volume-packing argument and simple geometry we then have 

X(x,x') = V0l(B R )(l + 0/J->oo;Af(l))- 

Comparing with the above we have 

V r(l-38) = / *f(l+3f) ^ Jr ^ + 

v ; 7(x,x')e(G*/r*) 2 

and so by ()6.14|) we have 

J r < (1 + O(e) + 0^00.^(1))^. 

This gives the upper bound for Claim 16.41 The lower bound is proven 
similarly. This concludes the proof of Proposition 16.21 and thus Theorem 

urn 

7. On a theorem of Gowers and Wolf 

Our aim in this section is to prove Theorem 11.131 whose statement we 
recall now. 

Theorem 7.1 (Theorem ll,13p . Let \& = (tpi, . . . , ipt) be a collection of linear 
forms ipi,...,ipt '■ ^> D — * 2, and let s ^ 1 be an integer such that the 
polynomials Vr +1 ! • • • > V't +1 are linearly independent. Then for any function 
f : [N] —7- C bounded in magnitude by 1 (and defined to be zero outside of 
[N]) obeying the bound ||/||t/s+in\n ^ ^ f or some 5 > 0, one has 

t 

E ne[N]D JJ/^iCn)) = < 5_ >0 ; S) 0,t,*(l)- 

8=1 

Henceforth we allow all implied constants to depend on d, t, s, without 
indicating this explicitly. Let s' = s'(^f) be the Cauchy-Schwarz complexity 
of the linear forms ^f, as defined in Theorem 14.11 We may of course assume 
that s' > s, as Theorem 11.131 is immediate otherwise. We may also assume 
that N is large depending on 5, since otherwise the claim is trivial from a 
compactness argument. 

Let e > be a small number depending on 5 to be chosen later, and 
let J 7 be a growth function depending on e to be chosen later. Applying 
Theorem 11.21 at degree s' (after first decomposing / as a linear combination 
of 0(1) functions taking values in [0,1]), we can find a positive quantity 
M = £t jr(l) and a decomposition 

/ = /nil + /sml + /unf (7.1) 

where: 

/nil is a (.F(M), iV)-irrational virtual nilsequence of degree ^ s', complex- 
ity ^ M, and scale N; 

/ sm i has L 2 [N] norm at most e; 
/unf has U S ' +1 [N] at most 1/T(M); 

All functions / n o, / S mi, /unf are bounded in magnitude by O(l). 
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We apply this decomposition to split the expression 

t 

K ne [N]oUf(Mn)) (7.2) 
i=l 

as the sum of 3* terms, in which each copy of / has been replaced with either 

/nil) /sml) Or / un f. 

Any term involving at least one factor of / sm i can be easily seen to be of 
size 0(e) by crudely estimating all other factors by 1. By (14. ip . any term 
involving at least one factor of / un f is of size 0(l/J r (M)), which is also of 
size 0(e) if T is chosen to be sufficiently rapidly growing depending on e. 
We can therefore express (|7.2[) as 

t 

1=1 

By hypothesis, we can write 

/nii(n) = F(g(n)T,n(mod q),n/N) 

for some q with 1 ^ q ^ M, some degree ^ s, (T(M), iV)-irrational, orbit 
n i — y g(n)T of complexity ^ M and some Lipschitz function F : G/Tx'L/q'Lx 
M of norm at most M. The mod q and Archimedean behaviour in / n ;i are 
nothing more than technical annoyances, and we set about eliminating them 
now. We encourage the reader to work through the heart of the argument, 
starting at (|7,3p below, in the model case / n n = F(g(n)T). Let e' be a 
small quantity depending on e, M to be chosen latecJ. We partition [N] 
into progressions P of spacing q and length e'N, plus a remainder set of size 
at most Om(1). We can then rewrite the above expression as 

t 

Ep 1 ,...,p D lEnePix— xPd 

n/nii(^(n)) + 0( £ ). 

8=1 

We abbreviate Pi x . . . x Pjj as P. For a given P, observe that as n ranges in 
P, the residue class of ipi(n) modulo q is equal to a fixed class ap^, and the 
value of tpi(P)/N differs by at most Om(e') from a fixed number xp^. We 
may assume that ipj £ [0, 1] for each i, otherwise the inner expectation is 
zero (except for a few "boundary" values of P which give a net contribution 
oiO M (e')). 

If e' is small enough depending on e,M, the Om(^') error in the above 
discussion can be absorbed in the 0{e) error, and so we have 
t t 
Ke[N]° II /(^( n )) = EpE neP H F(g(^(n))F, a P>i ,x P4 ) + 0(e). 

i=l i=l 



Readers may find it helpful to keep the hierarchy of scales 

1 » e > 1/M, 1/g > e' > 1/^"(M) > <5 > 1/N > 

in mind. 
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We now apply Theorem 11.111 , which tells us the the right-hand side here is 

Ep / F P + 0(e) + ojr (Af )_ >00 . M>e)e /(l) + OAr->oo;M, e , e '(l), (7.3) 
JG*/r* 

where as usual G* ^ G* is the Leibman group associated to the system of 
forms ^ = {"01, . . . , tpt}, and here -Fp : G*/r* — > C is the function 

t 

F P (G?i, . . . , &)r*) := n F( 5i I\ a Pii , x Pji ). 

The heart of the matter is to obtain an upper bound on the quantity 
Ep J G * Fp appearing in (17.3p . To do this, of course, we need to make use 
the assumption on the forms ipi, . . . , ipt, as well as the fact that ^ S. 

The aforementioned assumption, namely that , . . . , V't +1 are linearly 
independent, implies that \&[ s+1 ] is the whole of R* which, in view of the 
definition of the Leibman group G*, implies that G| s+1 ^ ^ G*. By Pubini's 
theorem, we thus have 



/G*/r* JG*/r* 



/ W 

JG*/r* jg' 
where 

t 

F P>< .((^i,..., ft )r*) :=fjF <s ( gi r,a P) i,x Pji ) (7.4) 

i=i 

and F<^ s is defined by averaging over cosets of Gr s+ i) , specifically 

F^ s (gT,a,x) := / F(^ s+1 r, a, x) dg s+x . 

•^G {s+1 )/r (s+1 ) 

Since F was Lipschitz with norm Om(1)> we see that F^ s is Lipschitz with 
norm Ojf(l) also. Also, since F is bounded in magnitude by 0(1), so is 
F^ s . 

As the forms ipi , . . . , ipt +1 are independent, we see in particular that ipi 
is non-zero. This implies that the projection of G* to the first coordinate 
G is surjective. Meanwhile, from (|7.4p and the boundedness of F<^ s we have 
the crude upper bound 

\Fp^ s {(gi, ■ ■ ■ ,fft)r)| < |F <s (3ir,a Pi i,x Pj i)|. 
From Fubini's theorem, we obtain the bound 



*p| < / l^ s (-,opi^P,i)l- ( 7 - 5 ) 
Jg/t 



/G*/r* JG/r 
To proceed further, we need a crucial smallness estimate on F^ s : 
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Proposition 7.2 (F^ s small in L 2 ). For any a G Z/qZ and x £ [0, 1], one 

/ \F <s (-,a,x)\ 2 «0( £ ) + M (£')+ 
iG/r 

OS-^-ooiM^.eKl) + °J r (A/)— i-oo;M,£,e'(l) + OiV->oo;M,e,e'(l)- 

Proof. By reflection symmetry we may assume that x ^ 1/2. We may also 
round x so that x = qn^/N for some no £ [N/2q], as the error in doing so 
can be easily absorbed by the Lipschitz properties of F^ s . 

By construction, F^ s is invariant on G( s+ i)-cosets, while F — F^ s in- 
tegrates to zero on any such coset. In particular, F<^ s (-,a,x) and F — 
F^ s (-,a,x) are orthogonal, and thus 

I \F^ s (-,a,x)\ 2 = [ FFZ(-,a,x). 
JG/r Jg/t 

Applying Theorem 11.111 (really just the special case of this result asserting 
that (g(n)r) is equidistributed, cf. Lemma |3.T[) and the Lipschitz nature of 
FF^ S , the right-hand side can be written as 

^n£{e'N]FF^s{g(qn + qn + a)T, a, x) + ojr (M) ^ 00;Mi£ie ,(l) + oat^^m,^ 1 )- 

Let P be the progression {qn + qnQ + a : n S [e'iV]}. Then by a further use 
of the Lipschitz properties of F, we can rewrite the above expression as 

E neP F(g(n)T,n mod q,n/N)i>(n) + M (e') 

+ °T(M)-+oo;M,e,e>(]-) + OjV->oo;M,e,e'(l) (7-6) 

where 

ip(n) := F sis (g(n)T,a,x). 

Note that, as a consequence of the C7( s+ i)-invariance of F^ s , ifj(n) is a degree 
^ s nilsequence of complexity Ojw(l). Now by (|7.ip we have 

F{g{n)T,n mod q,n/N) = f(n) - / un f(n) - f sm \(n). 

The contribution of / sm i( n ) to ()7.6p is 0(s) by the Cauchy-Schwarz inequal- 
ity. Now consider the contribution of /. Observe that because F<^ s is Gf s+ x)- 
invariant, ip is a degree ^ s nilsequence of complexity Ojw(l). Meanwhile, 
||/ ||;7 S +i [TV] ^ o" by hypothesis. Applying the converse to the inverse con- 
jecture for the Gowers norms (first established in |26j . though for a simple 
proof see [3U Appendix G]), we see that 

Similarly, since ||/ un f ||jp'+invi ^ 1/T{M) and s' ^ s, we have 

E neP f(n)tp(n) = ojr {M) ^ Q . M ^ )£ ,(l). 
Putting all of these estimates together, we obtain the claim. □ 
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Applying this bound and (|7.5|) , we can thus bound (|7.3[) in magnitude by 

0(e) + Om(s') + OS->oc;M,s,£'(l) + J-(M)^oo;Af,e,£'(l) + OJV->oo;Af,e,e'(l)- 

Choosing e' sufficiently small depending on M and e, and choosing J 7 suf- 
ficiently rapidly growing depending on e, and then using the bound M = 
£} j?(l) (and recalling that iV can be chosen large depending on 5), we 
conclude that 

t 

iE„ e[ ivpn^ n ))i 

4 = 1 

whenever 5 is sufficiently small depending on e. Theorem 11.131 follows . 

Remark. It seems certain that one can extend this result to the case 
when one has t distinct functions fi, ■ ■ ■ , ft '■ [N] — > C rather than a single 
function / : [N] — > C. The main change in the argument would be to use 
a version of the regularity lemma (Theorem II. 2\\ valid for several functions 
simultaneously, in which one regularises the /i , . . . , ft using the same data 
M, q, (G/T,G 9 ), gQ (but allows each function fi to be given a separate 
Lipschitz function Fi : G/T x 7Ljq7L X R — > C). Such a result could be 
obtained by straightforward modifications to the proof of Theorem II .2\ but 
we do not pursue this matter here. 

Appendix A. Properties of polynomial sequences 

In this appendix we collect a variety of facts and definitions concerning 
polynomial sequences in nilpotent groups, all of which were required at some 
point in the paper proper. We take for granted the definition of filtration G, 
and of the group poly(Z rf , G,) of polynomial sequences g : 7L d — > G adapted 
to G.; these notions were recalled in the introduction. 

Taylor expansions. Polynomial sequences may be described in terms 
of so-called Taylor expansions. In the lemma that follows we make use of the 
generalised binomial coefficients (") are the generalised binomial coefficients 

/ (m, . . . , n D )\ _ / n{\ ( n D \ 
V (h,.. -,«£))/' \hj "' \iDj 

where 

(n\ n(n — 1) . . . (n — i + 1) 
i) ' = i\ • 

If i = (ii, . . . ,id) € f$ D is a D-tuple of non-negative integers we define the 
degree |i| := i\ + . . . + in- Choose an arbitrary ordering on N D with the 
property that |i| ^ |j| whenever i ^ j. 

Lemma A.l (Taylor expansions). Suppose that g £ poly(Z D , G,). Then 
there are unique Taylor coefficients g\ G Gui with the property that 

9(n) = II dP 

ieN d 
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for all n G Z . Conversely, every Taylor expansion of this type gives rise 
to a polynomial sequence g G poly(Z' D , G,). 

Remarks. This is proven in |28| Lemma 6.7]. Note that, since G is 
nilpotent, this is a finite expansion. In the case D = 1 (which will feature 
most prominently in the paper) the it takes the form 



Note how, from the presentation of polynomial sequences as Taylor expan- 
sions, it is by no means clear (and somewhat remarkable) that they form a 
group under pointwise multiplication (Theorem II. 6ft . 

Polynomial sequences that vary slowly, in a certain sense, are called 
smooth. We employ the following definition, which is the same as the one 
given in the introduction to |28j. 




Definition A. 2 (Smooth sequences). Let A be a positive parameter and let 
N > 1 be an integer. Let (5 G poly(Z,G,). We say that (3 is (A, N)-smooth 
if we have d G (/3(n),id) < A and d G (f3(n), f3(n + 1)) < A/N for all n G [N]. 

Here dc is a metric on the group G constructed using the Mal'cev basis, 
see |28| Definition 2.2]. The precise definition of this metric is not terribly 
important for our analysis. 

In counterpoint to the notion of a smooth sequence is that of a rational 



Definition A. 3 (Rational sequences). Let A 1 be an integer, and let 
(G/T,G 9 ) be a filtered nilmanifold. Then an element g G G is A-rational 
if there is some q, 1 ^ q ^ A, such that g q G T. If 7 G poly(Z,G,) is a 
polynomial sequence then we say that it is A-rational if 7(71) is A-rational 
for every integer n. 

We have the following basic facts about smooth and rational sequences: 

Lemma A. 4 (Basic facts). Let (G/T,G m ) be a filtered nilmanifold of com- 
plexity ^ Mq. By a "sequence", we mean an element o/poly(Z, G m ). Then: 

(i) The product of two (A, N)-smooth sequences is Om a(1) -smooth; 

(ii) The product of two A-rational sequences is Om ,a{]-) -rational; 

(iii) Any A-rational sequence is periodic with period Om ,a(1)- 

Proof. For (i), see |28| Lemma 10.1]; for (ii), see |28| Lemma A. 11 (v)]; 
and for (iii), see [28| Lemma A. 12 (ii)]. In fact these results hold in the 
multiparameter setting, with polynomially effective bounds, but we will not 
need these facts here. □ 

We turn now to an important new definition for this paper, that of an 
irrational polynomial sequence. In [28], much emphasis was placed on the 

^One could take an "adelic" perspective here and view smooth sequences as those that 
are local to the Archimedean place 00, while rational sequences are those that are local 
to finite places p. 



gin) = gogl 1 ' ...gt 



C) 




sequence. 
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notion of an equidistributed polynomial sequence g : Z — > G: one for which 
the orbit (g(n)T) ne \m is close to equidistributed on G/T. The notion of 
an irrational sequence implies equidistribution (see Lemma 13.7} which is 
also a special case of Theorem II. lip , but also encodes an assertion that the 
filtration G, is in some sense "minimal" for the sequence. To illustrate the 
difference, let us think about a simple abelian case in which G/T is just the 
unit circle M/Z (written additively), and g : 7L — > IR is a polynomial 

g(n) = a + a\ (^j + . . . + a s Q. (A.l) 

This sequence is adapted to the filtration in which G^ = M. for i ^ s and 
G(i) = {0} for i > s. Qualitatively speaking, g is equidistributed if at least 
one of Q!i, . . . , a s is irrational; in contrast, g is irrational with respect to this 
filtration if it is a s which is irrational. Note that if s > 1 and a s is rational, 
then (after removing the periodic component a s n s from g) g is now adapted 
to the filtration G', in which GL = R for i s - 1 and G' {i) = {0} for 
i > s — 1, which has a strictly smaller total dimension. This basic example 
is the model for the more sophisticated result in Lemma 12.91 

Let us turn now to the precise definition in the more general setting of 
Lie group- valued polynomial sequences, in which the role of the a, is played 
by the Taylor coefficients of g. We need a preliminary definition. 

Definition A. 5 (i-horizontal characters). Let (G/T, G 9 ) be a filtered nil- 
manifold of degree ^ s with filtration G, = (G^)^q. Then by an i- 
horizontal character we mean a continuous homomorphism from £j : G^ — > 
E which vanishes on T^-j and on [G^, G^_j^] for any ^ j ^ i. We 

say that such a character is non-trivial if it is not constant. We can assign 
a notion of complexity by taking a Mal'cev basis adapted to G 9 , where- 
upon one has a natural isomorphism G^/G^ i+ ^ = M. k . Writing ip(gi) for 
the coordinates of <?j(mod G(j + i)), any z-horizontal character has the form 
£i(<?i) = ^-^(ft); f° r some vector fh = (mi, . . . ,mk) of integers. We may 
then define the complexity of £i to be \mi\ + • • • + \rrik\- 

The list of subgroups on which £j is required to vanish looks rather re- 
strictive and slightly unnatural at first sight. Roughly speaking, this list is 
intended to isolate that behaviour which genuinely "belongs" to the degree 
i portion of the filtered nilmanifold, as opposed to arising from those terms 
of higher or lower degree, or which disappear after quotienting out by the 
lattice r. 

Definition A. 6 (Irrationality). Let (G/T,G 9 ) be a filtered nilmanifold of 
degree ^ s with filtration G, = (G^)^ . Let g t £ G^y Let A,N > 0. Then 
we say that gi is (A, N) -irrational in G^ if for every non-trivial i-horizontal 
character £j : G^ — > M. of complexity ^ A one has ||£i(sOllR/z ^ A/N % . We 
say that the sequence g(n) is (A, N) -irrational if its i th Taylor coefficient gi 
is (A, AQ-irrational in G^ for each i, 1 ^ i ^ s. 
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To understand this definition, it is helpful to consider examples. We leave 
it as an exercise to check that in the abelian case (IA.1|) this amounts to 
stipulating that the top coefficient of g is poorly approximated by rationals, 
thus [|9a s [|]R/z ^ A' /N s whenever 1 ^ q ^ A'. 

A second interesting case to examine is that in which g(n) = g n is a linear 
polynomial sequence adapted to the lower central series filtration (Gi)%L . 
For the lower central series filtration there are no nontrivial i-horizontal 
characters when i ^ 2, and 1-horizontal characters are the same thing as 
horizontal characters in the sense of [281 Definition 1.5]. It follows from this 
and [281 Theorem 1.16] that g(n) is irrational if and only if (g(n)T) nG ^ is 
equidistributed. Now polynomial sequences that are not linear do not arise 
naturally in ergodic-theoretic settings such as those considered in [3J [39] , and 
thus the equivalence of the notions of "irrational" and "equidistributed" 
in this setting explains why the former concept has not appeared in the 
literature before. The need for it is a new feature of the quantitative world, 
as is the need for polynomial nilsequences themselves, for reasons explained 
on [281 §!]• 

The following third example is also edifying. Take g(n) to be any poly- 

(1 ctTt 'yn? \ 
1 fin • 
1 / 

This sequence is a polynomial sequence adapted to the lower central series fil- 
tration Go = G% = G, G*2 = [G, G], G3 = {id}, and it will be equidistributed 
in that setting for generic a,(3,j. However g is also a polynomial sequence 
with respect to some much flabbier nitrations, for example the one in which 
G(o) = C(i) = G( 2 ) = ••• = G( 10 ) = G, G(n) = ••• = G(ioo) = [G,G] 
and Gvj) = {id} for i ^ 101. It is easy to check that g is not irrational in 
this setting, and indeed irrationality is somehow detecting the fact that a 
given filtration G, is minimal for g. This point is quite clear in the proof 
of Lemma 12.91 (which itself depends on Lemma IA.7I below), where the fail- 
ure of a sequence to be irrational is used to create a coarser filtration for a 
polynomial sequence related to g. 

Lemma A. 7. Suppose that (G/T,G 9 ) is a filtered nilmanifold of degree ^ s 
with filtration G, = (G^)^q. Suppose that g is not (A, N) -irrational. Then 
there is an index i, 1 ^ % ^ a, such that the i th Taylor coefficient gi factors 
as fiig'ffi, where /8i,gJ,7i G Gw, g[ lies in the kernel of some i-horizontal 
character £3 : Gu\ — > R of complexity at most A, da (Pi, id) = Oa,m(N~ 1 ) 
and ji is O A,M(i-)- ra ti° na l- 

Proof. The proof is (unsurprisingly) extremely similar to that of [281 Lemma 
7.9]. Reversing the definition of irrational polynomial sequence, we see that 
there is an index i together with an i-horizontal character £j : Gu\ — > R 
such that ||£i(ffi)||]R/z ^ A/N l . It is convenient at this point to work in 
a Mal'cev coordinate system adapted to whereby Gu\/Gi i+ i\ may be 
identified with R fc and Yu^/Gu^-^ with Z k . If gi G Gu\ then, as above, 
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we write Y>(<?) G M fc for the corresponding coordinates. Then £j has the 
form £j(<7i) = rh.ip(g) for some vector m = (mi, . . . ,mk) of integers with 
\ m i\ + • • • + |mjfc| ^ A. Now by assumption we have H^l-VKflOllR/z ^ A/N l , 
and therefore m.iplgi) = r + 0{A/N l ) for some integer r. It follows from 
simple linear algebra that we may write tp(gi) = t + u + v, where rh.u = 0, 
the coordinates of v lie in for some Q = 0^(1) and each coordinate 

of t is Oa(1/N 1 ). Now choose G G/j) in such a way that ip(Pi) = t and 
dc(/3i,id) = Oa,m0-/Ni), choose an O^^l^-rational element 7$ G G/j) with 
ipdi) = v, and finally choose g\ so that ^ = fiig'fii. Then one automatically 
has ipig'i) = u, which means that g\ lies in the kernel of the i-homomorphism 

" □ 

Finally, we record a convenient scaling lemma. 

Lemma A. 8 (Scaling lemma). Let (G/T,G») be a filtered nilmanifold of 
complexity ^ M. If g G poly(Z,G») is (A, N) -irrational, r G [— iV, iV], croc? 
1 ^ q ^ M, i/ten i/te sequence n i-> g(ng + r) is (3>m,s A eN) -irrational for 
any e > 0. 

Proof. We need to show that the z th Taylor coefficient of n 1— >■ ^(nq + r) is 
(^M.s A, eiV)-irrational for each i ^ 0. Note that we may assume i ^ M 
since the filtered manifold has degree ^ M. 

Fix i. We may quotient out the nilmanifold by the normal subgroups 
GVj + i) and [G(j\,Gu_j\\ for ^ j ^ i, since these do not affect the irra- 
tionality of the i th coefficient. We may then expand g as a Taylor series 

g(n) = f[gP, 



3=0 



and thus 



1 (qn+r\ 

g (qn + r) = ]J g- 3 . 



j=0 

Expanding out the binomial coefficient and using many applications of the 
Baker-Campbell-Hausdorff formula, we obtain 

1 



g{qn + r) = (\{(g' J p)g q :^ 



j=0 



for some g'j G G(j) ; the point being that the Baker-Campbell-Hausdorff term 
cannot generate any terms involving polynomials in n of degree i or higher 
due to the fact that the groups G( i+1 ) and [G^, G^_^\ have been quotiented 
out. As a consequence, we see that the i th Taylor coefficient of n >->■ g(qn+r) 
is q l gi, and the claim is easily verified. □ 



54 BEN GREEN AND TERENCE TAO 

Appendix B. A multiparameter equidistribution result 

The purpose of this appendix is to prove Theorem 13.61 which we recall 
here again. 

Theorem 13.61 Suppose that (G/T,G 9 ) is a filtered nilmanifold of com- 
plexity < M and that g G poly(Z D ,G.) is a polynomial sequence for some 
D ^ M. Suppose that A C 7lP is a lattice of index ^ M, that no £ iP has 
magnitude M, and that P C [— N, N] D is a convex body. Suppose that 
5 > 0, and that 



ne(n +A)nP 1 i JG/L 



for some Lipschitz function F : Gr/T — > C. Then there is a nontrivial 
homomorphism 77 : G — > M. which vanishes on T, has complexity Om(1) and 
such that 

\\V 9\\c°°({N] D ) = °8,m(X)- 

Recall from [28, Definition 8.2] that the norm HsHc 00 ^] ) OI * a polynomial 
sequence g : [N] D — > R is given by the formula 

\\g\\c°°({N]D) = sup iV" 11 '!!^" 

where g\ are the Taylor coefficients of g, thus 



")=E(T>. 



We now prove the theorem, allowing all implied constants to depend on 5 
and M. We may assume that N is sufficiently large depending on S, M, since 
the claim is trivial otherwise. A simple volume packing argument (using [29, 
Corollary A. 2], for example, to control the boundary terms) shows that 

l(no + A ) n p l = pr^j + °n->oo(n d ). 

As a consequence, for N large enough we may subtract off the mean of F 
and normalise F to have Lipschitz norm 1 and mean zero, thus 

I J2 F (9(.n)r)\ »A D . 

nG(n +A)nP 

As A has index ^ M in 7j D , it contains the sublattice qlP for some positive 
integer q = O(l). By the pigeonhole principle, we may thus find ni £ iP 
of magnitude 0(1) such that 

I Yl F(g(n)T)\^N D , 

nG(ni+ g Z D )nP 
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and thus 

| F (ff(9n + ni)r)| > N D . 

n& D r\P' 

for some convex body P' contains in a ball of radius 0(N) centered at the 
origin. 

By subdividing P' into cubes of sidelength eN for some sufficiently small 
e > (and again using [291 Corollary A. 2] to control the boundary terms), 
and then applying the pigeonhole principle, we see that 

| Yl F(fl(gn + m)r)| »iV D 

neZ D nn 2 + [eN] D 

for some £> 1 and n 2 = O(N). We can rearrange this as 

| Yl F(g(qn + n 3 )T)\^N D 
n& D n[eN] D 

for some n3 = O(N). 

We may now invoke [281 Theorem 8.6] to conclude that there exists a 
nontrivial homomorphism rj : G — > R which vanishes on T, has complexity 
0(1) and such that 

\\r ] og(q-+n 3 )\\ C ov m D ) < 1. 
Applying [28], Lemma 8.4] we conclude that 

||Q?7 5 (. + n 3 )|| C oo ([A rp ) < 1 

for some non-negative integer Q = 0(1). Shifting the Taylor expansion by 
113, we conclude that 

\\Qv 9\\c°°([N) D ) < L 
The claim follows (with r\ replaced by Qrf). 

Appendix C. The Baker-Campbell-Hausdorff formula 

Let G be a connected, simply connected nilpotent Lie group, and let exp : 
q — > G and log : G — > q be the associated exponential and logarithm maps 
between G and its Lie algebra 53. The Baker-Campbell-Hausdorff formula 
asserts that 

exp(Xi) exp(X 2 ) = exp(Xi + X 2 + -[X 1 ,X 2 ] + Y[c a X a ) 

a 

for any Xi,X 2 , where a is a finite set of labels, c a are real constants, and 
X a are an iterated Lie bracket of k\ = k\ }0 copies of X\ and k% = k 2a copies 
of X 2 where fei, k 2 ^ 1 and k\ + k 2 ^ 2. 

Using this formula, it is a routine matter to see that for any g\ , g 2 E G 
and x E R, we have 

(9192T = g^glJlg^ (C.l) 

a 
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where a is a finite set of labels, each g a is an iterated of k\ = k\ )Cc copies 
of g\ and = k2, a copies of <?2 where k±, ^ 1 and k\ + hi ^ 2, and the 
Q a : R — > R are polynomials of degree at most k\ + /c2 with no constant 
term. 

In a similar vein, for any gi, g<i G 6? and xi, X2 £ R, we have the formula 

[9 X 1 \92 2 } = [9i,92} XlX2 U^ aiXUX2) (C2) 

where a is a finite set of labels, each <7 a is an iterated commutator of k\ — 
copies of gi and ki = k2, a copies of 52 where k±,k2 ^ 1 and k± + &2 ^ 3, and 
the P a : R x R — )• R are polynomials of degree at most k\ in x\ and at most 
k2 in X2 which vanish when x\ = or X2 = 0. 
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Abstract. Szemeredi's regularity lemma can be viewed as a rough 
structure theorem for arbitrary dense graphs, decomposing such graphs 
into a structured piece, a small error, and a uniform piece. We estab- 
lish an arithmetic regularity lemma that similarly decomposes bounded 
functions / : [N] — > C, into a (well-equidistributed, virtual) s-step nilsc- 
quence, an error which is small in L 2 and a further error which is mi- 
nuscule in the Gowers !7 s+1 -norm, where s ^ 1 is a parameter. We 
then establish a complementary arithmetic counting lemma that counts 
arithmetic patterns in the nilsequence component of /. 

We provide a number of applications of these lemmas: a proof of 
Szemeredi's theorem on arithmetic progressions, a proof of a conjecture 
of Bergelson, Host and Kra, and a generalisation of certain results of 
Gowers and Wolf. 

Our result is dependent on the inverse conjecture for the Gowers U s+1 
norm, recently established for general s by the authors and T. Ziegler. 



To Endre Szemeredi on the occasion of his 70th birthday. 
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1. Introduction 

Szemeredi's celebrated regularity lemma |48} [49] is a fundamental tool in 
graph theory; see for instance [36j for a survey of some of its many applica- 
tions. It is often described as a structure theorem for graphs G = (V, E), but 

l 
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one may also view it as a decomposition for arbitrary functions / : V x V — > 
[0,1]. For instance, one can recast the regularity lemma in the following 
"analytic" form. Define a growth function to be any monotone increasing 
function T : R + -> R + with T(M) ^ M for all M. 

Lemma 1.1 (Szemeredi regularity lemma, analytic form). Let V be a finite 
vertex set, let f : V x V — > [0, 1] be a function, let e > 0, and let T : R + — > 
M + be a growth function. Then there exists a positive intege^ M = O e ^{l) 
and a decomposition 

f = /str + /smi + /unf (1-1) 

of f into functions / str , / sm i, / unf : V x V -4- [-1, 1] such that: 

(i) (/str structured) V can be partitioned into M cells V%, ■ ■ ■ , Vm, such 
that / s tr is constant onVixVj for all i,j with 1 ^ i,j ^ M; 

(ii) (fsmi small) The quantit<B\\f sm i\\ L 2(y xV ) := (E„ jWe y |/ sm iO, w)| 2 ) 1/2 
is ai mosi e. 

(iii) (/ U nf I'ery uniform) The box norm ||/ un f lln 2 (V'xV'); defined to be the 
quantity 

(E Wl (V3 )W1 , W2 e y /unf («1 , «Ji ) / ml f ( Ul , ^2 ) /unf (v 2 ,Wl) / un f ( «2 , W 2 ) ) 1/4 , 

is at most l/J r (M). 

(iv) (Nonnegativity) / str and / s t r + / sm i take values in [0, 1]. 

Informally, this regularity lemma decomposes any bounded function into 
a structured part, a small error, and an extremely uniform error. While 
this formulation does not, at first sight, look much like the usual regular- 
ity lemma, it easily implies that result: see [53]. The idea of formulating 
the regularity lemma with an arbitrary growth function T first appears in 
P], and is also very useful for generalisations of the regularity lemma to 
hypergraphs. See, for example, [52]. The bound on M turns out to essen- 
tially be an iterated version of the growth function T , with the number of 
iterations being polynomial in 1/e. In applications, one usually selects the 
growth function to be exponential in nature, which then makes M essen- 
tially tower-exponential in 1/e. See [5U [54] for a general discussion of these 
sorts of structure theorems and their applications in combinatorics. See also 
|42j for a related analytical perspective on the regularity lemma. 

In applications the regularity lemma is often paired with a counting lemma 
that allows one to control various expressions involving the function /. For 
example, one might consider the expression 

^u,v,weV f(u, v)f(v, w)f(w, u), (1.2) 

*As usual, we use O(X) to denote a quantity bounded in magnitude by CX for some 
absolute constant X; if we need C to depend on various parameters, we will indicate this 
by subscripts. Thus for instance O e ,^(l) is a quantity bounded in magnitude by some 
expression C Et jr depending on e,J~. 

2 We use here the expectation notation E a £Af(a) '■= X^aeA /( a ) f° r am/ finite non- 
empty set A, where \A\ denotes the cardinality of A. 
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which counts triangles in V weighted by /. Applying the decomposition 
(jl.ip splits expressions such as (jl.2j> into multiple terms (in this instance, 
27 of them). The key fact, which is a slightly non-trivial application of 
the Cauchy-Schwarz inequality, is that the terms involving the box-norm- 
uniform error / un f are negligible if the growth function T is chosen rapidly 
enough. The terms involving the small error /smi £ire somewhat small, but 
one often has to carefully compare those errors against the main term (which 
only involves / s t r ) in order to get a non-trivial bound on the final expression 
(jl.2p . In particular, one often needs to exploit the positivity of / str and 
/str + /smi to first localise expressions such as (|1 .2|) to a small region (such 
as the portion of a graph between a "good" triple Vi,Vj,Vk of cells in the 
partition of V associated to / s t r ) before one can obtain a useful estimate. 

The graph regularity and counting lemmas can be viewed as the first non- 
trivial member of a hierarchy of hypergraph regularity and counting lemmas, 
see e.g. Q21 12Q1 S3 HH [52] . The formulation in [52] is particularly close 
to the formulation given in Theorem 11.11 These lemmas are suitable for 
controlling higher order expressions such as 

^u,v,w,a:evf(u, v, w)f(v, w, x)f(w, x, u)f(x, u, v). 

Our objective in this paper is to introduce an analogous hierarchy of such 
regularity and counting lemmas (one for each integer s ^ 1), in arithmetic 
situations. Here, the aim is to decompose a function / : [N] — > [0, 1] defined 
on an arithmetic progression [N] := {1, . . . , N} instead of a graph. One is 
interested in counting averages such as 

E„, re [AT]/(n)/(n + r)f(n + 2r), 

which counts 3-term arithmetic progressions weighted by /, as well as higher 
order expressions such as 

E„, r e[iv]/(n)/(n + r)f(n + 2r)/(n + 3r). 

As it turns out, the former average will be best controlled using the s = 1 
regularity and counting lemmas, while the latter requires the s = 2 versions 
of these lemmas. In this paper we shall see several examples of these types 
of applications of the two lemmas. 

The arithmetic regularity lemma. We begin with by formulating 
our regularity lemma. Following the statement we explain the terms used 
here. 

Theorem 1.2 (Arithmetic regularity lemma). Let f : [N] —> [0,1] be a 
function, let s ^ 1 be an integer, let e > 0, and let T : K + — > M + be a growth 
function. Then there exists a quantity M = O s e and a decomposition 

f /nil /sml /unf 

of f into functions f nih / sm i, / unf : [N] ->• [-1, 1] of the following form: 
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(i) (/nil structured) f n n is a (J 7 (M), N) -irrational virtual nilsequence 
of degree ^ s, complexity ^ M , and scale N; 

(ii) (/smi small) f sm \ has an L 2 [N] norm of at most e; 

(iii) (/unf very uniform) / un f has a U S+1 [N] norm of at most l/F(M); 

(iv) (Nonnegativity) f a n and f a \\ + / sm i take values in [0, 1]. 

Remark. This result easily implies the recently proven inverse conjecture 
for the Gowers norms (Theorem 12. ip . Conversely, this inverse conjecture, 
together with the equidistribution theory of nilsequences, will be the main 
ingredient used to prove Theorem 11.21 

We prove this theorem in £j2j We turn now to a discussion of the vari- 
ous concepts used in the above statement. Readers who are interested in 
applications may skip ahead to the end of the section. 

The L 2 [N] norm, used to control / sm i, is simply 

WfhHN] ■= (E n6[A r]|/(n)| 2 ) 1/2 - 

We turn next to the Gowers uniformity norm U S+1 [N], used to control 
/ un f. If / : G — > C is a function on a finite additive group G, and k ^ 1 
is an integer, then the Gowers uniformity norm \\f\\ukfQ\ is defined by the 
formula 

\\f\\ uHG) := {E xM _ hk£G A hl . . . A h J(x)) 1/2k , 

where A^f : G — > C is the multiplicative derivative of / in the direction h, 
defined by the formula 

A h f(x) := f(x + h)W). 

In this paper we will be concerned with functions on [N], which is not 
quite a group. To define the Gowers norms of a function / : [N] —> C, set 
G := 7,/NZ for some integer N ^ 2 N, define a function / : G — > C by 
f(x) = f(x) for x = 1, . . . , N and f(x) = otherwise, and set ||/||j/fc[jv] := 
H/llc/ fe (c7)/l|l[iV] Hi7 fc (G)> where lyy] is the indicator function of [N]. It is easy 
to see that this definition is independent of the choice of N, and so for 
definiteness one could take N := 2 k N . Henceforth we shall write simply 
||/||{/fc, rather than ||/||[/fcnvi; since all Gowers norms will be on [N]. One 
can show that || • \\ V k is indeed a norm for any k ^ 2, though we shall not 
need this here; see [18]. For further discussion of the Gowers norms and 
their relevance to counting additive patterns see [H], [27l §5] or [55j §11]. 

Finally, we turn to the notion of a irrational virtual nilsequence, which is 
the concept that defines the structural component / n o. This is the most com- 
plicated concept, and requires a certain number of preliminary definitions. 
We first need the notion of a filtered nilmanifold. The first two sections of 
[30j may be consulted for a more detailed discussion. 

Definition 1.3 (Filtered nilmanifold). Let s ^ 1 be an integer. A filtered 
nilmanifold G/T = (G/T, G,) of degree ^ s consists of the following data: 

(i) A connected, simply-connected nilpotent Lie group G; 
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(ii) A discrete, cocompact subgroup r of G (thus the quotient space 
G/Y is a compact manifold, known nilmanifold) ; 

(iii) A filtration G, = (Gu\)^ of closed connected subgroups 

G = G(o) = G(x) > G(2) ^ • • 

of G, which are rational in the sense that the subgroups Yu\ := 
rnG(j) are cocompact in G(j\, such that [G^,G^\ C Gu + j\ for all 
i, j ^ 0, and such that G^ = {id} whenever i > s; 

(iv) A Mal'cev basis X = (X\, . . . , ^dim(G)) adapted to G 9 , that is to say 
a basis Xi, . . . , A dim ( G ) of the Lie algebra of G that exponentiates 
to elements of Y, such that Xj, . . . , -X"dim(G) span a Lie algebra ideal 
for all j i ^ dim(G), and X dim(G) _ dim(G(i))+1 , . . . ,X dim(G) spans 
the Lie algebra of Gu\ for all 1 ^ i ^ s. (For a detailed discussion 
of this concept, see [301 §2]-) 

Once a Mal'cev basis has been specified, notions such as the rationality 
of subgroups may be quantified in terms of it. Furthermore one may use a 
Mal'cev basis to define a metric d G /p on the nilmanifold G/Y. The results of 
this paper are rather insensitive to the precise metric that one takes, but one 
may proceed for example as in [301 Definition 2.2]. We encourage the reader 
not to think too carefully about the precise definition (or about Mal'cev 
bases in general), but it is certainly important to have some definite metric 
in mind so that one can make sense of notions such as that of a Lipschitz 
function on G/Y. 

Observe that every filtered nilmanifold G/Y comes with a canonical prob- 
ability Haar measure /U G/ /p, defined as the unique Borel probability mea- 
sure on G/Y that is invariant under the left action of G. We abbreviate 
Ig/t F ( x ) d ^G/r(x) as J G/r F. 

We will need a quantitative notion of complexity for filtered nilmanifolds, 
though once again, the precise definition is somewhat unimportant. 

Definition 1.4 (Complexity). Let M ^ 1. We say that a filtered nilmani- 
fold G/Y = (G/Y, G.) has complexity ^ M if the dimension of G, the degree 
of G., and the rationality of the Mal'cev basis X (cf. |30|. Definition 2.4]) 
are bounded by M. 

Heisenberg example. The model example of a degree ^ 2 filtered nilman- 
ifold is the Heisenberg nilmanifold 



G/Y := o l R / o l z 
Vooi/Vooi 



with the lower central series G( ) = Gm = G and 

G (2) = [G,G] = {H\) 

with Mal'cev basis X = {Xi, X 2 , X%} consisting of the matrices 
v /oio\ /ooo\ /oor 

Vooo/' Vooo/' 6 Vooo, 
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With the definition of filtered nilmanifold in place, the next thing we need 
is the idea of a polynomial sequence. The basic theory of such sequences was 
laid out in Leibman |37j . and was extended slightly to general nitrations in 
[30| . An extensive discussion may be found in Section 6 of that paper. 

Definition 1.5 (Polynomial sequence). Let (G/T,G,) be a filtered nilman- 
ifold, with filtration G, = (G^)^ Q . A (multidimensional) polynomial se- 
quence adapted to this filtered nilmanifold is a sequence g : iP — > G for 
some D ^ 1 with the property that 



for all i ^ and hi, . . . , hi, n 6 TP ', where d^g(n) := g(n + h)g(n)~ l is the 
derivative of g with respect to the shift h. The space of all such polynomial 
sequences will be denoted poly(Z D , G,). The space of polynomial sequences 
taking values in T will be denoted poly(Z D ,r.). When D = 1, we refer to 
multidimensional polynomial sequences simply as polynomial sequences. 

Remark. We will be primarily interested in the one-dimensional case 
D = 1, but will need the higher D case in order to establish the counting 
lemma, Theorem II. Hi 

One of the main reasons why we work with polynomial sequences, instead 
of just linear sequences such as n t- > gog™, is that objects of the former type 
constitute a group. 

Theorem 1.6 (Lazard-Leibman). If (G/T,G 9 ) is a filtered nilmanifold and 
D ^ 1 is an integer, then poly(Z D ,G») is a group (and poly(Z D ,r,) is a 
subgroup). 



With the concept of a polynomial sequence in hand, it is easy to define a 
polynomial orbit. 

Definition 1.7 (Orbits). Let D,s ^ 1 be integers, and M, A > be pa- 
rameters. A (multidimensional) polynomial orbit of degree ^ s and com- 
plexity ^ M is any function^ n ^ g(n)T from lP -> G/T, where (G/T, G.) 
is a filtered nilmanifold of complexity sj M, and g E poly(Z D ,G,) is a 
(multidimensional) polynomial sequence. 

Using the concept of polynomial orbit, we can define the notion of a 
(polynomial) nilsequence, as well as a generalisation which we call a virtual 
nilsequence, in analogy with virtually nilpotent groups (groups with a finite 
index nilpotent subgroup). 



Strictly speaking, the orbit is the tuple of data (G, V, G/T, G,, n h-s> g(n)T), rather 
than just the sequence n h-> g(n)T, but we shall abuse notation and use the sequence as a 
metonym for the whole orbit. 



dh! ■ ■ ■ dh z g(n) G G(i) 




□ 
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Definition 1.8 (Nilsequences). A (multidimensional, polynomial) nilse- 
quence of degree ^ s and complexity ^ M is any function / : iP C 
of the form f(n) = F(g(n)T), where n i— > g(n)T is a polynomial orbit of de- 
gree ^ s and complexity ^ M, and F : G/T — > C is a function of Lipschitz 
nor nfl at most M. 

Definition 1.9 (Virtual nilsequences). Let N ^ 1. A virtual nilsequence of 
degree ^ s and complexity ^ M at scale N is any function / : [N] — > C of 
the form f(n) = F(g(n)T,n(mod q),n/N), where 1 ^ q ^ M is an integer, 
n i — y g(n)T is a polynomial orbit of degree sj s and complexity ^ M, and 
F : G/T x Z/gZ x R — >• C is a function of Lipschitz norm at most M. (Here 
we place a metric on G/T x TL/qTL x R in some arbitrary fashion, e.g. by 
embedding TL/qL in R/Z and taking the direct sum of the metrics on the 
three factors.) 

One concept that featured in Theorem 1 1 . 2 1 remains to be defined: that of 
an irrational orbit. The definition is a little technical (see Definition IA.6|) 
and takes some setting up, and so we defer it and the discussion of some 
motivating examples to Appendix [A] Very roughly speaking, an irrational 
orbit is one whose coefficients are not close to rationals (of bounded height) 
and for which the filtration G 9 is as small as possible. For instance, with a 
polynomial sequence P : [N] — > R/Z of the form P(n) = a s n s + . . . + ato, 
then (roughly speaking) this sequence would be considered irrational if one 
takes Gu\ = R for i ^ s and Gu\ = {0} for i > s, and if there was no 
positive integer q = 0(1) for which || g« s <C N~ s . Again, we refer the 
reader to Appendix [A] for further examples and discussion. 

This concludes our attempt to discuss all the concepts involved in the 
arithmetic regularity lemma, Theorem II. 2| we turn now to a statement and 
discussion of the counting lemma. 

Counting lemma. In applications of the arithmetic regularity lemma, 
we will be interested in counting additive patterns such as arithmetic pro- 
gressions or parallelepipeds. To understand the phenomena properly it is 
advantageous to work in a somewhat general setting similar to that taken in 
[22| [23j [Ml EI] • In the latter paper one works with a family ^ = (ipi , . . . ,ip t ) 
of integer-coefficient linear forms (or equivalently, group homomorphisms) 
tpi , . . . , tpt '■ Z D — > Z, and consider expressions such as 



The (inhomogeneous) Lipschitz norm ||-F||Li P of a function F : X — > C on a metric 
space X = (X, d) is defined as 



E n6Z onp/(^ (*))... /(&(n)) 



(1.3) 



\F\\ Up := sup |.F(a;)| + sup 




\F(x)-F(y)\ 

\x - y\ 
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where P is a convex subset of R . Thus, for instance, if counting arithmetic 
progressions, one might use the linear forms 

if>i(ni,n 2 ) := n x + (i - l)n 2 ; i = l,...,k (1.4) 

whilst for counting parallelepipeds one might instead use the linear forms 

^wi,...,w fc («0) n%, ■ ■ -,n-k) ■= n + u\n\ + . . . + uj k n k ; ui,. . . ,u k € {0, 1}. 

(1.5) 

In order to understand the contribution to (jl.3p coming from the struc- 
tured part / n ii of /, one is soon faced with the question of understanding 
the equidistribution of the orbit 

( 5 (^i(n))r,..., 9 (^(n))r) (1.6) 

inside (G/T)*, where n = (m, . . . , no) ranges over iP n P. We abbreviate 
this orbit as g*(n)r*, where 5* : iP — > G t is the polynomial sequence 

/(n):=( 5 (^(n)),..., 5 (^(n))). (1.7) 

A very useful model for this question, in which infinite orbits were considered 
in the "linear" case g{n) = g n x, was studied by Leibman [41]. His work leads 
one to the following definition. 

Definition 1.10 (The Leibman group). Let = (ipi, . . . , ipt) be a collection 
of linear forms ipi, . . . , ipt '■ ^ D — > For any i ^ 1, define to be the 
linear subspace of R fc spanned by the vectors (^(n), . . . , ijrj. (n)) for 1 ^ j ^ i 
and n € Z 13 . Given a filtered nilmanifold (G/T, G m ), we define the Leibman 
group G* < G* to be the Lie subgroup of G* generated by the elements g, % 
for i ^ 1, gi G G(j), and S with the convention thalU 

for each g *E G. Note that G* is normal in G* because G(j) is normal in 
G. We will show in $3] that G* is also a rational subgroup of G', thus 
T* := T* n G* is a discrete cocompact subgroup of G*. 

Examples. Two particular instances of this construction correspond to the 
two lattices (jl.4p and (jl.5p above. In the case of arithmetic progressions, 
where VP is as in (jl.4|) . the Leibman group G* is sometimes (see, for example, 
|12j ) referred to as the Hall-Petresco group HP fc (G # ) and has the particularly 
simple alternative description 

HP fc (G.) = G* = {((/(O), ...,#-l)):!/e P oly(G.)}, 

We will prove this fact in $31 In the case of parallelepipeds, where is as 
in (|1.5j) . the Leibman group G* has been referred to as the Host-Kra cube 
group [31] and it too has an alternative description. See [31] Appendix E] 

^We define g v for real v by the formula g v := exp(t> log(gr)), where exp : g — > G is the 
usual exponential map from the Lie algebra q to G (this is a homeomorphism since G is 
nilpotent, connected, and simply connected). 
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for more information: we will not be making use of this particular group 
here. 

Let g E poly(Z, G.) be a polynomial sequence, and let ^ = (ipx, . . . ,ipt) 
be a collection of linear forms . . . , ipt '■ Z rf ~ * Z. It turns out (see Lemma 
I3.2p that the sequence takes values in G*. More remarkably, the orbit 
(jl.6p is in fact totally equidistributed on G^/r* if g is sufficiently irrational. 
It is this result that we refer to as our counting lemma. 

Theorem 1.11 (Counting lemma). Let M,D,t,s be integers with 1 ^ D,t, 
s ^ M , let (G/T, G,) be a degree ^ s filtered nilmanifold of complexity ^ M, 
let g : 7L — >• G be an (A, N) -irrational polynomial sequence adapted to G m , 
let *f> = (ipi, . . . , ipt) be a collection of linear forms ipi,...,ipt '■ ^ D ~^ Z 
with coefficients of magnitude at most M , and let P be a convex subset of 
[-N,N] D . Then for any Lipschitz function F : (G/T)* — > C of Lipschitz 
norm at most M , one ha^ 



£ F(/(n)r')=vol(P) / F 



+ A ->oo-m(N D ) + ON^-MiN ), 

where g(0) A := (g(0), . . . ,g(0)) € G l and the integral is with respect to the 
probability Haar measure /U 9 (o)^G*/r* on ^ e coset 

5 (o) A G*/r*, 

viewed as a subnilmanifold of (G/r)*, and vol(P) is the Lebesgue measure 
ofP in R D . 

More generally, whenever A si IP is a sublattice of index \ZP : A] M , 
and no G TP , one has 



E 



^(n)r') = / F 



n e( n 0+ A)nP - : A] 7 3(0) A G * /r * 

The counting lemma is, of course, best understood by seeing it in ac- 
tion as we shall do several times later on. The errors oa^oo-,m{N d ) and 
on->-oo;m(N d ) are negligible in most applications, as A will typically be a 
huge function J~(M) of M, and N can also be taken to be arbitrarily large 
compared to M. 

We remark that one could easily extend the above lemma to control aver- 
ages of virtual irrational nilsequences, rather than just irrational sequences, 
by introducing some additional integrations over the local factors TLjqTL and 
R, but this would require even more notation than is currently being used 
and so we do not describe such an extension here. 



USG OA. — \oo\ m(X) to denote a quantity bounded in magnitude by cm(A)X, where 
Cm{A) — ► as A — > oc for fixed M. Similarly for other choices of subscripts. 
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Applications. The proofs of the regularity and counting lemmas oc- 
cupy about half the paper. In the remaining half, we give a number of 
applications of these results to problems in additive combinatorics. The 
scheme of the arguments in all of these cases is similar. First, one applies 
the arithmetic regularity lemma to decompose the relevant function / into 
structured, small, and (very) uniform components / = f n \\ + f sm \ + / un f- 
Very roughly speaking, these are analysed as follows: 

(i) /nil is studied using algebraic properties of nilsequences, particularly 
the counting lemma; 

(ii) /smi is shown to be negligible, though often (unfortunately) some 
additional algebraic input is required to ensure that this error does 
not conspire to destroy the contribution from f n n; 

(iii) /unf is easily shown to be negligible using results of "generalised von 
Neumann" type as discussed in $U 

As we shall see, dealing with the error f sra y can cause a certain amount 
of pain. To show that this error is truly negligible, one often has to prove 
that patterns guaranteed by / n ;i (such as arithmetic progressions) do not 
concentrate on some small set which might be contained in the support of 

/sml' 

We now give specific examples of this paradigm. In £}6] we give a "new" 
proof of Szemeredi's famous theorem on arithmetic progressions. This is 
hardly exciting nowadays, with at least 16 proofs already in the literature 
[21 El El GDI III IIBI IIS IS 03 03 EH E21 EB] as well as (slightly implicitly) 
in [U l35l 158]. However this proof makes the point that for a certain class of 
problems it suffices to "check the result for nilsequences", and in so doing 
one really sees the structure of the problem. Just as random and structured 
graphs are two obvious classes to test conjectures against in graph theory, 
we would like to raise awareness of nilsequences as potential (and, in certain 
cases such as this one, the only) sources of counterexamples. 

The second application, proven in §5\ is to establish a conjecture of Bergel- 
son, Host and Kra [4j. Here and in the sequel we use the notation X <C Q ,e Y 
or Y 3> a ,e X synonymously with X = O ae (Y), and similarly for other choice 
of subscripts. 

Theorem 1.12 (Bergelson-Host-Kra conjecture). Let k = 1,2,3 or A, and 
suppose that < a < 1 and e > 0. Then for any N ^ 1 and any subset 
A C [N] of density \A\ ^ aN , one can find ^> a ,e N values of d E [-N, N] 
such that there are at least (a k — e)N k-term arithmetic progressions in A 
with common difference d. 

Remarks. The claim is trivial for k = 1, and follows from an easy averag- 
ing argument when k = 2. This theorem was established in the case k = 3 
by the first author in [25j: we give a new proof of this result which may be 
of independent interest. The case k = 4 is new, although a finite field ana- 
logue of this result previously appeared in lecture notes of the first author 
|26j (reporting on joint work). Our proof of the k = 4 argument relies on 



ARITHMETIC REGULARITY AND COUNTING LEMMAS 



11 



the inverse conjecture for the C/ 3 norm, proven in [28]. A counterexample 
example of Ruzsa in the appendix to [4] shows that Theorem 1 1 . 1 2 1 fails when 
k 5. An ergodic counterpart to Theorem 11.121 (which, roughly speaking, 
replaces a single scale ./V with a sequence of scales going to infinity and takes 
a limit), using a related but slightly different argument, was established in 

m- 

Finally, in ^TJ we establish a generalisation of a recent result of Gowers 
and Wolf [22, 23, 24J regarding the "true" complexity of a system of linear 
forms. 

Theorem 1.13. Let = (tpi, . . . ,ipt) be a collection of linear forms from 
—7- Z, and let s ^ 1 be an integer such that the polynomials V>i +1 > • • ■ , V ; t +1 
are linearly independent. Then for any function f : [N] — > C bounded in 
magnitude by 1 (and defined to be zero outside of [N]) obeying the bound 
\\f\\u s + 1 [N] ^ $ f or some 6 > 0, one has 

t 

E ne[7VP II/faM 11 )) = °<5^0; S ,D,t,*(l)- 

1=1 

Remarks. This result was conjectured in [22J, where it was shown that the 
linear independence hypothesis was necessary. The programme in [22, 231I24] 
gives an alternate approach to this result that avoids explicit mention of 
nilsequences, and in particular establishes the counterpart to Theorem 11.131 
in finite characteristic; their work also gives a proof of this theorem in the 
case when the Cauchy-Schwarz complexity of the system (see Theorem I4.ip 
is at most two, and with better bounds than our result, which is all but 
ineffective. It is worth mentioning that the arguments in [221 [23l [24] also 
develop several structural decomposition theorems along the lines of The- 
orem 11.21 but using the language of (high-rank) locally polynomial phases 
rather than (irrational) nilsequences. 

Relation to previous work. A result closely related to Theorem 
11.21 in the case s = 1 was proved by Bourgain as long ago as 1989 [7J. 
In that paper, the decomposition was applied to give a different proof of 
Roth's theorem, that is to say Szemeredi's theorem for 3-term progressions. 
A different take on this result was supplied by the first author in [25] , where 
the application to the case k = 3 of the Bergelson-Host-Kra conjecture was 
noted. In that same paper a construction of Gowers [16] was modified to 
show that any application of the arithmetic regularity lemma must lead to 
awful (tower-type) bounds; the same kind of construction would show that 
the cases s ^ 2 of Theorem 11.21 lead to tower-type bounds as well. IeQ [26] 
the analogue of the case s = 2 of Theorem 11.21 in a finite field setting was 
stated, proved, and used to deduce the finite field analogue of the Bergelson- 
Host-Kra conjecture in the case k = 4. In that same paper the present work 



The relevant part of these lecture notes by the first author reported on joint work of 
the two of us. 
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was promised (as reference [22]) at "some future juncture" . Four years later 
we have reached that juncture and we apologise for the delay. We note, 
however, that until the very recent resolution of the inverse conjectures for 
the Gowers norms \33\ [34] many of our results would have been conditional; 
furthermore, we are heavily dependent on our work [30], which had not been 
envisaged when the earlier promise was made. 

In the meantime a greater general understanding of decomposition theo- 
rems of this type has developed through the work of Gowers [21] . Reingold- 
Trevisan-Tulsiani-Vadhan [36], and Gowers- Wolf [22J EH [24] ; see also the 
survey [53J of the second author. While Theorem 11.21 is related to several of 
these general decomposition theorems, it also relies upon specific structure 
of nilmanifolds. In any case it seems appropriate, in this volume, to give a 
proof using the "energy increment argument" pioneered by Szemeredi. 

The ergodic theory analogue of Theorem 11.21 is the classification of char- 
acteristic factors for the Gowers-Host-Kra seminorms || • ||t/s+ipn (the er- 
godic theory counterpart of the Gowers norms) as inverse limits of nilsys- 
tems, which was first established by Host and Kra [35J. Roughly speak- 
ing, this classification allows one to decompose any bounded non-negative 
function / € L°°(X) in an (ergodic) measure-preserving system as a sum 
/ = Zstr + Zsmi + Amf, where ||/unf||c/=+i(x) = 0, / sm i is as small as one wishes 
in the L 2 {X) norm, and / s t r arises from an s-step nilsystem factor of X. This 
fundamental decomposition has many applications; for instance, in [4] it was 
used (together with the Furstenberg correspondence principle) to establish 
an ergodic analogue of Theorem 11.121 in which A is a set of integers rather 
than a finite subset of [N], with the notion of upper density replacing the 
notion of cardinality. It appears however that this correspondence principle 
does not directly yield "single-scale" results such as Theorem 11.121 from the 
ergodic theory results. 
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2. Proof of the arithmetic regularity lemma 

We now prove Theorem ll.2[ The proof proceeds in two main stages. 
Firstly, we establish a "non-irrational regularity lemma", which establishes 
a weaker version of Theorem 11.21 in which the structured component f Q n is a 
polynomial nilsequence, but one which is not assumed to be irrational. The 
main tool here is the inverse conjecture GI(s) for the Gowers norms [34] . 
combined with the energy incrementation argument that appears in proofs 
of the graph regularity lemma. In the second stage, we upgrade this weaker 
regularity lemma to the full regularity lemma by converting the nilsequence 
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to a irrational nilsequence. The main tool here is a dimension reduction 
argument and a factorisation of nilsequences similar to that appearing in 

The non-irrational regularity lemma. We begin the first stage 
of the argument. As mentioned above, the key ingredient is the following 
result. 

Theorem 2.1 (GI(s)). Let s ^ 1, and suppose that f : [N] — > C is a 

function bounded in magnitude by 1 such that \\f\\u s+1 [N] ^ f or some 
5 > 0. Then there is a degree ^ s polynomial nilsequence ^ : Z — > C of 
complexity St $(l) such that \{f,ip) l 2 [n} \ ^>s,5 1; where 

{f,^)i?[N] ^ne[N]f{n)ip(n) 
is the usual inner product. 

Remark. The difficulty of this conjecture increases with s. The case 
s = 1 easily follows from classical harmonic analysis. The case s = 2 was 
established by the authors in [28J, building upon the breakthrough paper of 
Gowers |17j . The case s = 3 was recently established by the authors and 
Ziegler in |33| . and the general case will appear in the forthcoming paper 
[33] by the authors and Ziegler. 

For technical reasons, it is convenient to replace the notion of a degree 
^ s polynomial nilsequence by a slightly different concept. The following 
definition is not required beyond the end of the proof of Proposition 12.71 

Definition 2.2 (s-measurability). Let $ : K + — > M + be a growth function 
and s ^ 1. A subset E C [N] is said to be s-measurable with growth function 
$ if for every M ^ 1, there exists a degree ^ s polynomial nilsequence 
ip : Z [0, 1] of complexity < $(M) such that 

HV> - IeWvw < 1/M. 

An example of a 1-measurable function would be a regular Bohr set, as 
introduced in [8] and discussed further in [281 §2]. We will not need Bohr 
sets elsewhere in this paper, so we shall not dwell any longer on this example. 
However the reader will see ideas related to the basic theory of those sets in 
the proof of Corollary 12.31 below. 

We make the simple but crucial observation that if E, F are s-measurable 
with some growth functions <£, respectively, then boolean combinations of 
E, F such as E(~)F, EL)F, or [iV]\£? are also s-measurable with some growth 
function depending on Underlying this, of course, is that fact that 

the product and sum of two nilsequences is also a nilsequence, and hence 
the set of nilsequences form a kind of algebra (graded by complexity). The 
role of algebraic structure of this kind was brought to the fore in the work 
of Gowers [21] cited above. 

Theorem 12. II then implies 



11 



BEN GREEN AND TERENCE TAO 



Corollary 2.3 (Alternate formulation of GI(s)). Let s ^ 1, and suppose 
that f : [N] — > [—1, 1] is such that \\f\\u s + 1 [N] ^ ^ f or some 5 > 0. Then there 
exists a growth function & s> s depending only on s,5, and an s-measurable 
set E C N with growth function & s ,6, such that 

\^ne[N]f('ri)l E (n)\ > Sj5 1. 

Proof. We allow implied constants to depend on s, 5. By Theorem 12. 1\ there 
exists a degree ^ s polynomial nilsequence ip of complexity 0(1) such that 



\®ne[N]f(n)i>(n)\ > 1. 

By taking real and imaginary parts of ip, and then positive and negative 
parts, and rescaling, we may assume without loss of generality that tp takes 
values in [0, 1]. By Fubini's theorem, we then have 

Ke[N]f(n)l Et (n) dt\ » 1 



o 

where E t := {n G [N] : ip{n) ^ t}. We thus see that there is a subset 
f2 C [0, 1] of Lebesgue measure 3> 1 such that 

\E ne[N] f(n)l Et (n)\ » 1 

uniformly for all t G 0. 

It remains to show that at least on^l of the Et is s-measurable with respect 
to a suitable growth function. For any t G R, we consider the maximal 
function 

M(t) : =sup-^|{nG [N] : \^{n) - 1\ < r}\. 
r>0 2r N 

From the Hardy-Littlewood maximal inequality or the Besicovitch covering 
lemma we have that the set {t G K : M(t) > A} has Lebesgue measure 
0(1/ X) for any A > 0. Thus, we can find t G ft such that M(t) = 0(1). 
Fixing such a t, we then see that 

\{n G [N] : \tp(n) - 1\ < r}\ < riV 

for all r > 0. As a consequence, for any r > 0, one can then approximate 
l Et to within 0(y/r) in L 2 [A^] norm by a Lipschitz function of ^ with Lips- 
chitz norm 0(1 /r). This implies that l Et is s-measurable with some growth 
function $ depending only on s, 5, and the claim follows. □ 

We rephrase this fact in terms of conditional expectations. The following 
definition, like Definition 12.21 will only be needed until the end of the proof 
of Proposition 12.71 



^Here we are, in some sense, finding a "regular" nil-Bohr set {n £ [N] : ip(n) ^ t}, 
that is to say one rather insensitive to small changes in the value of t. A similar idea also 
appears in [461 Claim 2.2]. 
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Definition 2.4 (s-factors). An s-factor B of complexity sj M and growth 
function $ is a partition of [N] into at most M sets (or cells) E%, . . . , E m 
which are s-measurable of growth function <£. Given an s-factor B and 
a function / : [N] — > C, we define the conditional expectation E(/|23) : 
[N] — > C of / with respect to the s-factor to be the function which equals 
E nS E j ./(n) on each cell of the partition. We define the index or energy £{B) 
of the s-factor B relative to / to be the quantity ||E(/|£>)||^2nv]- 

An s-factor B' is said to refine another B if every cell of B' is contained 
in a cell of B. 

Corollary 2.5 (Lack of uniformity implies energy increment). Let s ^ 1, 
let B be an s-factor of complexity ^ M and some growth function <1>, and 
suppose that f : [N] — > [0, 1] is such that \\f — E(/|2?)||{7«+imv] ^ 5 for some 
5 > 0. Then there exists a refinement B' of B of complexity 2M and some 
growth function depending on s,5, M, <1>, such that 

£(B') - £(B) » M 1. 

Proof. By Corollary 12.31 we can find an s-measurable set E with a growth 
function depending on s, 5 such that 

\(f-E(f\B),l E ) L2[N] \^ 5 l (2.1) 

Now let B' be the partition generated by B and E; then B' clearly has 
complexity ^ 2M and a growth function depending on s,5,M,&. Since 1e 
is measurable with respect to the partition B' (that is to say it is constant 
on each cell of this partition), we can rewrite the left-hand side of (|2,ip as 

\(E(f\B')-E(f\B),l E ) L 2 [N] \ 

and hence by the Cauchy-Schwarz inequality 

||E(/|£O-E(/|0)|| L2[JV] >vl- 
The claim then follows from Pythagoras' theorem. □ 

We can iterate this to obtain a weak regularity lemma, analogous to the 
weak graph regularity lemma of Frieze and Kannan [15] . 

Corollary 2.6. Let s ^ 1, let B be an s-factor of complexity ^ M and 
some growth function <I>, let f : [N] — > [0,1], and let e > 0. Then there 
exists a refinement B' ofB of complexity O s m £ (1) and some growth function 
depending on s, e, M, <3?, such that 

\\f-E(f\B')\\ Us+1[N] ^e. (2.2) 

Proof. We define a sequence of successively more refined factors B' , starting 
with B' := B. If (|2,2p already holds then we are done, so suppose that 
this is not the case. Then by Corollary 12.51 we can find a refinement B" 
of complexity S) m,e(1) and some growth function depending on s,e,M, $ 
whose energy is larger than that of B' by a factor 3> s . e 1. On the other 
hand, the energy clearly ranges between and 1. Thus after replacing B' 
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with B" and iterating this algorithm at most StE (l) times we obtain the 
claim. □ 

One final iteration then gives the full non-irrational regularity lemma. 

Proposition 2.7. Let f : [N] -> [0,1], let s ^ 1, let e > 0, and let T : 

M + — > M + be a growth function. Then there exists a quantity M = O s E 
and a decomposition 

f /nil "I - /smi /unf 

of f into functions / m i, f, m f ■ [N] — > [—1,1] such that: 

(i) (/nil structured) f n n equals a degree ^ s polynomial nilsequence of 
complexity M. 

(ii) (/smi smaW) [l/smilUa^] ^ e. 

(hi) (/unf very uniform) ||/nii||{/«+i[jv] ^ l/F{M). 

(iv) (Nonnegativity) f n n and f n n + / sm i to/ce values in [0, 1]. 

Proof. We need a growth function T 7 : M + — > M + , somewhat more rapidly 
growing than T in manner that depends on J 7 , s, e. We will specify the 
exact requirements we have of it later. We then define a sequence 1 = Mq ^ 
Mi < ... by setting M := 1 and M i+1 := P{Mi). 

Applying Corollary 12.61 repeatedly, we may find for each i ^ an s-factor 
Bi of complexity O s ,Af<(l) an d a growth function depending on s,Mj, such 
that each refines 0j_i, and such that 

wf-nfmwus+i^^i/M, 

for all i ^ 0. 

By Pythagoras' theorem, the energies £(Bi) are non-decreasing, and also 
range between and 1. Thus by the pigeonhole principle, one can find 
i = O e (l) such that 

£(B t+l ) - £(B t ) ^e 2 /4, 

which by Pythagoras' theorem again is equivalent to 

\\E(f\B i+1 )-E(f\Bi)\\ L 2 [N] <e/2. 

Meanwhile, as Bi is an s-factor and / is bounded, we can find a degree ^ s 
polynomial nilsequence f n \\ : [N] — > R of complexity O s ,Afj(l) such that 

||E(/|Bi) - /nii||L2[iV] < e/2. 

Since E(/|£>j) ranges in [0, 1], we may retract f n n to [0, 1] also (note that this 
does not increase the complexity of / m i). If we then set / un f := /— E(/|0j + i) 
and / sm i := E(/|jBj+i) — / n ii, we obtain the claim. □ 

Remark. The application of the Hardy-Littlewood maximal inequality in 
the proof of Corollary 12.31 makes for a reasonably tidy argument. A more 
direct approach would be to carve up [N] into approximate level sets of 
nilsequences, and then to approximate the projections onto the factors thus 
defined by nilsequences using the Weierstrass approximation theorem. There 
are a number of technicalities involved in this approach, chiefly involving the 
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need to choose the approximate level sets randomly. This kind of argument 
was employed, in a closely related context, in [271 Chapter 7]. One can also 
use utilise arguments based on the Hahn-Banach theorem instead; see |21j . 
[46], and [221 EH [24]. 

Obtaining irrationality. Our task now is to replace the nilsequence 
/nil appearing in Proposition 12.71 with a highly "irrational" nilsequence as 
advertised in the statement of our main theorem, Theorem II. 2 i It turns out 
to be sufficient to establish the following claim. 

Proposition 2.8. Let s,Mq ^ 1, let J 7 be a growth function, and let f : 
Z — > [0,1] be a degree sj s nilsequence of complexity ^ Mq. Then there 
exists an M = O s ,m ,t0-)> such that f {when restricted to [N]) is also a 
(J 7 (M), N) -irrational degree ^ s virtual nilsequence of complexity ^ M at 
scale N . 

To establish Theorem 11.21 from this and Proposition 12.71 one first applies 
the latter result with T replaced by a much more rapid growth function 
T 1 , and then one applies Proposition 12.81 to the structured component / n ii 
obtained in Theorem 12.61 

It remains to prove Proposition 12.81 Let s, Mq, J-,ip be as in that propo- 
sition. By definition, we have ip = Fo(go(n)T) for some degree ^ s fil- 
tered nilmanifold (G/T,G 9 ) of complexity ^ Mq, a polynomial sequence 
go 6 poly(Z, G m ), and a function Fq : G/T —> C which has a Lipschitz 
norm of at most Mq. Since ip takes values in [0, 1], we may assume without 
loss of generality that Fq is real, and by replacing Fq with the retraction 
max(min(Fo, 1), 0) to [0,1] if necessary, we may assume that Fq also takes 
values in [0, 1]. Henceforth (G/T,G,), go, and Fq are fixed. 

Factorisation results. One of the main results of our paper [30] was 
a decomposition of an arbitrary polynomial nilsequence g on G/T into a 
product^ fSg'j, where /3 is "smooth", 7 is "rational", and g'(n)T is equidis- 
tributed inside some possibly smaller nilmanifold G'/T'. We need a similar 
result here, but with g' having the somewhat stronger property of being ir- 
rational that we mentioned in the introduction. The notion of irrationality 
is discussed in more detail in Appendix [Al 

We will be also using the notions of smooth and rational polynomial 
sequences from [30]. Again, the basic definitions and properties of these 
concepts are recalled in Appendix [Al 

Define a complexity ^ M subnilmanifold of (G/T, G 9 ) to be a degree ^ s 
filtered nilmanifold (G'/T',G' 9 ) of complexity ^ M, where each subgroup 
G',^ in the filtration G' 9 is a rational subgroup of the associated subgroup 
Gu\ of complexity sj M, T' = G' Pi T, and each element of the Mal'cev 
basis of (G'/T',G' 9 ) is a rational linear combination of the Mal'cev basis of 



In our paper [30] the letter e was used for a smooth nilsequence, but we use /3 here to 
avoid conflict with various uses of e to denote a small positive real number. 
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(G/T,G 9 ), where the coefficients all have height ^ M. We define the total 
dimension of such a nilmanifold to be the quantity Ylt=o dim(G r r i x); this is 
also the dimension of poly(Z, G m ) (thanks to the Taylor series expansion, 
Lemma lA.lj) . 

We make the easy remark that if (G'/V, G' 9 ) is a complexity ^ M subnil- 
manfiold of (G/T, G.) for some M > M , and (G"/F", G.) is a complexity 
< M subnilmanifold of (G'/V, G' 9 ), then (G" /T" , G'i) is a complexity O m (1) 
subnilmanifold of (G/T,G,). 

Our first lemma is very similar in form to [3Q|. Lemma 7.9]. 

Lemma 2.9 (Initial factorisation). Let (G'/T',G' m ) be a complexity ^ M 
subnilmanifold of (G/T, G 9 ) for some M ^ Mq, let g' G poly(Z, G' 9 ), and let 
A > and N ^ 1 . Then at least one of the following statements hold: 

(Irrationality) g' is (A, N) -irrational in (G' /V ,G' m ). 

(Dimension reduction) There exists a factorisation 

g' = N'i 

where f3 G poly(Z,G".) is (Om,a(^), N)- smooth, g" G poly(Z,G^') takes val- 
ues in a subnilmanifold (G"/T",G'^) of (G'/T',G' 9 ) of strictly smaller total 
dimension and of complexity Om,a(^), and 7 G poly(Z,G".) is Om,a(1)- 
rational. 

Proof. To make this proof a little more readable, we drop one dash from 
every expression. Thus g' becomes g, G" becomes G' , and so on. Suppose 
that g is not (^4, iV)-irrational. Recall (see Lemma lA.lj) that g has a Taylor 
expansion that we may write in the form 

„f r .\ „ „(l)„(2) „(s) 

9(n) — 9o9i 92 ■ ■ ■ 9s , 

where gi G G^ for each i. It follows from Lemma IA.7I that for some i, 
1 ^ i ^ s, we can factorise 

9i = Pi9ilh 

where g\ G G^ lies in the kernel of some horizontal character £j : G^ — > M 
of complexity Oa,m(^), 7« G Gu\ is O^m (l)-rational in the sense that 7™ G 
Tu\ for some in = Oa,m(1), and G Gu\ has distance Oa,m(^-/N 1 ) from 
the origin. 

We now divide into two cases, depending on whether i > 1 or i = 1. First 
suppose that i > 1. Then the Taylor expansion of g reads, with an obvious 
notation, 

g(n) = g<i(n)(Pig'ai)^^> g>i(n). 
By commutating all the /3jS to the left and all the 7jS to the right, and 
using the group properties of polynomial sequences (Theorem II .6h . one can 
rewrite this as 

g(n) = /3pV(r0 7 P 

where 

g (n) := g^njg^'gydn) 
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and g>i(n) is another polynomial sequence taking values in Gfo+iy Ob- 
serve that g' is then a polynomial sequence adapted to the subnilmani- 
fold (G'/T',G.), where G'/V = G/T and G' (j) = G {j) for j ^ i, but 
G',* = ker(^). This is indeed a subnilmanifold, with complexity Oa,m(^)', 
note that (Gva)jSo * s a filtration, thanks to our insistence in the definition 
of z-horizontal character (cf. Definition IA.6I) that [G(j\ , G/^j\] C ker(^) for 

all ^ j ^ i. Meanwhile, fi\ is a (Oa,m (1), AT)-smooth sequence and 7 4 
is a J 4 i M(l)-i'ational sequence, so we have the desired factorisation in the 
i > 1 case. 

When i = 1, the above argument does not quite work, because Gv^ would 
be distinct from G', Q -, and would thus not qualify as a filtration. But this 
can be easily remedied by performing an additional factorisation 

90 = Pog'o 

where f3 £ G' is a distance Oa,m(1) from the identity, and g' Q lies in the 
kernel of £[. This leads to a factorisation of the form 

g(n) = /3 /3lV NT? 

where 

g , (n) = g' g'i n 9>i(n) 
and g' >1 is a polynomial sequence taking values in G'r 2 y One then argues as 
before, but now one sets both G'L. and G'L, equal to the kernel of £[. □ 

We can iterate the above lemma to obtain the following result, which 
is analogous to [301 Theorem 1.19]. Apart from dealing with irrationality 
rather than equidistribution, the following result is somewhat different to 
that just cited in that one requires an arbitrary (rather than polynomial) 
growth function, but one does not (of course) need polynomial complexity 
bounds. A variant of [30\ Theorem 1.19] was also given in [33\ Theorem 4.2]. 

Lemma 2.10 (Complete factorisation). Let (G/T,G.) be a degree ^ s fil- 
tered nilmanifold of complexity Mq, and let g £ pory(Z,G,). For any 
growth function T' , we can find a quantity Mq $J M ^ Om,f'(X) and a 
factorisation g = f3g'^f where: 

(i) e poly(Z,G.) is (O m (1), N)- smooth; 

(ii) g' € poly(Z, G») is (J-' (M), N) -irrational in a subnilmanifold 
(G'/T',G'.) of (G/T, G.) of complexity O m (1), and 

(iii) 7 G poly(Z, G m ) is Om(1) -periodic. 

Proof. We use an iterative argument, setting f3 = 7 = id, g' = g, M = Mq, 
and (G'/T',G.) = (G/T, G.) to begin with. In particular, (G',T',G'.) is 
initially a subnilmanifold of (G/T, G,) of complexity Om (1). If g' is T'(M)- 
equidistributed in (G /T' , G' 9 ) then we are done; otherwise, by Lemma f2.9l we 
may factorise g' = P'g"y' where d is (Oj7ir M \(T), AQ-smooth, 7 is Oj7/r M \(l)- 
periodic, and g" now takes values in a subnilmanifold (G" /T" , G'l) of (G 1 /T' , 
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G' m ) of complexity Oj-'(m)(1) an d smaller total dimension than (G'/T',G' 9 ). 
We then replace j3 by f3/3', 7 by 7*7, g' by {G'/V,G' m ) by (G"/T", G'.'), 
and increase M to a quantity of the form Ojf'(m)(1), using Lemma [A. 41 to 
conclude that the new f3 is smooth and the new 7 is rational. We then iterate 
this process. Since the total dimension of (G/r, G.) is initially Om (1), this 
process can iterate at most Om (1) times, and the claim follows. □ 

With this lemma we can now establish Proposition 12 . 81 and hence Theorem 
11.21 Let F' be a rapid growth function (depending on e, Mq, F) to be chosen 
later. We apply Lemma l2.101 obtaining some M with Mq ^ M ^ Omo.J 7 '!!) 
and a factorisation 

^(n) = F(P(n)g'(nMn)T) 

with /3, 5' and 7 having the properties described in that lemma. 

The sequence 7 is Om (l)-rational and so, by Lemma [A. 41 the orbit n i-> 
7(n)T is periodic with some period q = Om(1), and thus 7(n)T depends 
only on n mod q. 

For each n, the rationality of 7(n) ensures that "f(n)T intersects T in a 
subgroup of T of index OmO-)- Since there are only Om(1) different possible 
values of 7(re)r, we may thus find a subgroup V of V of index Om(1) such 
that r' C 7(n)r for all n. 

We can thus express tp as a virtual nilsequence 

= F(g'(n)T',n mod q,n/N) 
where i 7 : G/r' x %jqL x R is defined by the formula 
F(x,a,y) := F(J3(Ny)&y(a)T) 

whenever y G and by Lipschitz extension to all y E M. where a is any 
integer with a = a mod g, and x is any element of G such that xV = x. 
One easily verifies that F is well-defined and has a Lipschitz norm of Om(1). 
Also, since g' was already (J-"(M), iV)-irrational in G/T, and T' has index 
Om(1) in r, we see that g' is {^>m ^(M), iV)-irrational in G/r'. Proposition 
12.81 now follows by replacing M by a suitable quantity of the form Om(1), 
and choosing F' sufficiently rapidly growing depending on F. 

3. Proof of the counting lemma 

The purpose of this section is to prove the counting lemma, Theorem ll.lll 
We begin by recalling from the introduction the definition of the Leibman 
group G*. 

Definition 3.1 (The Leibman group). Let \J r = (rpi, . . . ,vpt) be a collection 
of linear forms ipx,...,ipf : HP — > Z. For any i ^ 1, define to be the 
linear subspace of R* spanned by the vectors (-0j(n), . . . , ra(n)) fori ^ j ^ i 
and n £ Given a filtered nilmanifold (G/r,G»), we define the Leibman 
group G* < G* to be the Lie subgroup of G* generated by the elements g\ l 
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for % ^ 1, gi G Gu\, and iTj G with the convention that if w = (tr, . . . , vt) 
then 

/:= (<f\...,^)- 

Now might be a good time to remark explicitly that we have introduced 
a slightly vulgar convention that we hope will help the reader follow this 
section and other parts of the paper. Bold font letters such as n E M. D 
denote D-dimensional vectors, whilst arrows such as v £ R* denote t- vectors. 
Occasionally we shall write mj := dim(^W). 

When reading this section, it might be found helpful to have a running 
example in mind. We will take as an illustrative example the case D = 2, 
t = 4 and ^ = (ipi, . . . where il>i{ry) = n\ + «2 for i = 0,1,2,3. 

The system of course, defines a 4-term arithmetic progression. As we 
remarked in the introduction the corresponding Leibman group G* is also 
known as the Hall-Petresco group HP 4 (G). The reader will easily confirm 
that in this case we have 

*W = R(l,l, 1,1) ©R(0, 1,2, 3) 

and 

* [2] = R(l, 1, 1, 1) © R(0, 1, 2, 3) © M(0, 0, 1, 3) 

and 

f [3] = R(l, 1, 1, 1) © R(0, 1, 2, 3) © R(0, 0, 1,3) + R(0, 0, 0, 1) = M 4 . 

Some work must be done before we can describe G* = HP 4 (G) in a pleasant 
way. However we can already establish the following lemma, whose state- 
ment and proof go some way towards explaining the introduction of the 
Leibman group. 

Lemma 3.2. Let ^ = (rpi, . . . , ipt) be a collection of linear forms tp±, . . . , tyt ■ 
lP — > Z. Suppose that (G/T,G,) is a filtered nilmanifold and that g 6 
poly(Z, G,) is a polynomial sequence. Then the sequence g^ : iP — > G l 
defined by <7*(n) := (g(tpi(n)), . . . , g(ipt{n))) takes values in G*. 

Proof. The sequence g(n) has a (unique) Taylor expansion 

n (") 

g(n) = go9i ■ ■ -9s J 



with g-i G for all i (see Lemma |A. 1|) . Substituting in, it follows that 

9 (n) = [[Si i 

i=0 

and it is immediate from the definition that each element in this product 
lies in G*. □ 

The counting lemma, whose proof is the main objective of this section, 
was stated as Theorem 11.111 Essentially, it states that <7*(n)r* is equidis- 
tributed in G*/r* as n ranges over "nice" subsets of "big" lattices, provided 
that the original sequence g is suitably irrational. We will recall what that 
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means in due course, but our first task is to develop the basic theory of the 
Leibman group G*. At the moment, for example, we have not established 
that G* is a connected Lie subgroup of G* or that G*/T* has the structure 
of a filtered nilmanifold. Nor have we developed tools for calculating inside 
this group. 

Basic facts about the Leibman group and nilmanifold. We can 
endow R* with the structure of a commutative algebra over IR by using the 
pointwise product 

x-y= (xiyi, . . . ,x t y t ) 
and setting 1 = (1, . . . , 1) to be the multiplicative identity. With this algebra 
structure, one can view the spaces defined in Definition 1 1 . 1 1 as the span 
of the powers ^(np for n G Z, D and 1 ^ j ^ i, where we view $ as a 
homomorphism from iP to Z*. We have the following alternate definition 
of the 

Lemma 3.3 (Depolarisation). $W is the span of the products 

^(ni)...^( nj ), 
where 1 ^ j ^ i and rii , . . . , rij G iP . 

Proof. Clearly V&M is contained in this span. To establish the reverse con- 
tainment, we observe the elementary depolarisation identity 

y(n 1 )...V(n j ) = t^- (-l) M ^ini + ... + ^ nj ) J ' 

^' cJG{0,lp 

where uj = (u%, . . . ,ujj) and \oj\ := oj\ + . . . + loj, and the claim follows. □ 

As an immediate consequence we have 
Corollary 3.4 (Filtration property). For any i,j ^ 0, we have • C 

Let (G/r,G») be a degree ^ s filtered nilmanifold. From Definition 11.101 
the Leibman group G* is the subgroup of G* generated by the group ele- 
ments g\ % for i ^ 1, Vi G and gi G G^y For any io ^ 1> let G* q ^ be the 

subgroup of G* generated by those g\ l with i ^ iq, Vi £ , g^ £ Gm, with 
the convention that GX,% := G*. 

Lemma 3.5 (Filtration property for G*). G* := (G^)^ is a filtration 
on G*. /n oi/ier words, the G^ are nested with [G^,G*^] C G* + ^ /or a// 

Proof. It suffices to check that if Oj G G^), ^ G Gy), Vi = (vn, . . . , 1%) G Vt'M 

and = (vji, . . . ,Vj t ) G V?^ then [ff"*,*^] G G* + ^. But this follows from 
the Baker-Campbell-Hausdorff formula (see (|C.2p ). the filtration property 
of G (i) and Corollary E31 □ 
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The spaces form a flag 

< ^ . . . < *H ^ R* 

of subspaces which are rational (i.e. they can be defined over Q). From 
a greedy algorithm (and clearing denominators) we may thus find a basis 
vi, . . . , v ms G \I/M with the following properties: 

(i) (Integrality) v\, . . . , v ms all lie in Z*; 

(ii) (Partial span) For every 1 ^ i ^ s, vx, . . . , v mi span 

(iii) (Row echelon form) For each 1 ^ j ^ m s , there exists lj, 1 ^ Zj ^ t, 
such that Vj has a non-zero lj coordinate, but such that iy has a 
zero Zj coordinate for all j < j' sj m s . 

For instance, the basis 

t?i := (1, 1, 1, 1); ^ := (0,1,2,3); v 3 := (0, 0, 1, 3); zT 4 := (0, 0, 0, 1) 

we implicitly gave above for our running example is already in this form. 

Fix such a basis. For each basis element Vj, we can define the degree 
deg(vj) of that element to be the first i for which j ^ m,j, thus deg(vj) is an 
integer between 1 and s, and ??j G \p[ deg ("j)l. 

Observe that an arbitrary element of G* can be expressed as a product of 
finitely many elements of the form g v - 3 for ^ j ^ m s and gj G G( deg (^.)). By 
many application^ of the Baker-Campbell-Hausdorff formula (see (IC.lj) ) 
and Lemma 13.51 we can now express any element of G* in the form 

IK' (3-D 

i=i 

where gj G G( deg( ^)) for all 1 < j ^ m s . 

Thus, in our running example, we have the explicit description of G* = 
HP 4 (G) as 

{(90,9091,909x92,909x9^93) ■ 9o G G( ),5i G G {l) ,g 2 G G (2) ,fif 3 G G (3) }. 

Note that from results on the Taylor expansion (see Lemma I A. ip this group 
may also be identified as 

{(g(0),g(l),g(2),g(3)) : g G poly(Z,G.)}. 

The group nature of HP 4 (G) is then easily deduced from Theorem 11.61 but 
this presentation is somewhat specific to the Hall-Petresco case and we shall 
not require it further. 



Indeed, one uses (|C.1[) and Lemma 13.51 to extract out and collects all terms with 
degree deg(«,) = 1, leaving only terms with base gj in Gn)- Then one extracts out those 
terms with degree 2 (merging them with the i = 1 terms as necessary), leaving only terms 
with base in (?(3). Continuing this process gives the desired factorisation. 
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From the row-echelon form one can verify inductively that the represen- 
tation (13. ip is unique (this can be seen clearly by working with the Hall- 
Petresco example presented above). This gives G* the structure of a con- 
nected, simply connected Lie group, with dimension 

s 

dim(G*) = ^dim(G (i) )(dim(^) - dim^" 1 !)) (3.2) 
i=i 

(with the convention that \&[°' is trivial). A similar argument also shows 
that every element of G* o ) can be expressed uniquely in the form (|3.ip . 
where now gj is constrained to lie in G r ( max ( c ieg(^),jo)) rather than G^eg^))- 
In particular, by reading off the coefficients gj one at a time, this implies 
the pleasant identity 

G*=G*n(G (j) ) fc . (3.3) 

Remark. From Taylor expansion (see Lemma lA.lj) we see that the se- 
quence <?* in (|1.7p lies in poly(Z, G*). While we do not directly use this 
fact here, it may help explain why the filtration G* will plays a prominent 
role in the proof of the counting lemma that we will shortly come to. 

Recall that we normalised the basis vectors Vj E Z* to have integer coeffi- 
cients. As a consequence, we see that if the gj are in T, then the expression 
(|3,ip lies in T k . From this (and many applications of Lemma I3.5[) we see 
that := T k n G^ is cocompact in G^ for each i, and so (G*/r*,G*) 
is a filtered nilmanifold. Furthermore, the same argument shows that the 
GJ^ are rational subgroups of G k and so (G*/r*,G*) is a subnilmanifold 

of (G k /T k ,G k ). 

THE COUNTING LEMMA: PRELIMINARY MANOEUVRES. Now that we have 
verified that G*/T* is indeed a nilmanifold, we can begin the proof of 
Theorem II. Ill 

We begin with some easy reductions. First, observe that for fixed M, there 
are only finitely many possibilities for s,D,t,^f, and (up to isomorphism) 
there are only finitely many possibilities for (G/T,G») and T. Thus it will 
suffice to establish the result for a single choice of s, D, t, (G/T, G.), with 
the bounds depending on these quantities. Hence, we fix these quantities 
and allow all implicit constants to depend on these quantities (thus, in this 
section, we will not explicitly subscript out 0(1) quantities). 

Similarly, because the space of Lipschitz functions with Lipschitz norm 
O(l) is precompact in the uniform topology (by the Arzela-Ascoli theorem), 
it suffices to prove the desired bound for each fixed F, as the uniformity in 
F then follows from an easy approximation argument. Thus we fix F and 
allow all quantities to depend on F. 

Next, we observe that we may normalise #(0) = id. Indeed, we may 
factorise g(0) = C070 where dc(co,id) = 0(1) and 70 G T. Factorising, we 
obtain 

g(n) = coff'(n)7o 
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where g'(n) := co7o(7o~ g(n)jo)- Note that g'(0) = id and that Taylor 
coefficients of g' are given by g[ = jQ 1 giJo, and so g' is also (A, iV)-irrational. 
It is then an easy matter to see that Theorem 11,111 for g and F follows from 
Theorem 11.111 for g' and for the shifted function F'(x) := F(cqx), which is 
still Lipschitz with norm 0(1). 

Note that we may assume that A and N are large, as the claim is trivial 
otherwise. 

Equidistribution in the Leibman group. Let us recall what we 
are trying to prove. In the counting lemma, Theorem II. Ill our aim is to 
show that if g(n) is suitably irrational then the orbit (<7^(n)) ng ( no+ A)nP is 
equidistributed on the Leibman nilmanifold G*/T*. We shall proceed by 
contradiction, supposing this orbit is not equidistributed and deducing that 
g(n) could not have been irrational. The reader should recall the definition 
of irrational in this context: it is given in Definition IA. 61 

Our main tool will be a mild generalisation of the "multiparameter Leib- 
man criterion", which is [30J Theorem 8.6]. Here is the statement we shall 
use. 

Theorem 3.6. Suppose that (G/T, G 9 ) is a filtered nilmanifold of complexity 
^ M and that g G poly(Z' D , G,) is a polynomial sequence for some D ^ M. 
Suppose that A C iP is a lattice of index ^ M , that no G IP has magnitude 
^ M, and that P C [— N, N] D is a convex body. Suppose that 5 > 0, and 
that 



for some Lipschitz function F : G/T — > C. Then there is a nontrivial 
homomorphism n : G — > R which vanishes on T, has complexity 0^(1) and 
such that 



Remarks. This differs from |30l Theorem 8.6] in several insubstantial 
ways. On the one hand we have no concern here with the polynomial bounds 
that were important in that setting. However, we are dealing here with a 
sublattice A C 1, D rather than TL D itself, and with an arbitrary convex body 
P rather than the box [iVj^ 1 . This more general result can be deduced from 
|3U| Theorem 8.6] in a somewhat routine, though slightly tedious, manner. 
We sketch the details in Appendix [Bj The notation C°°{[N] D ) is recalled 
both in the appendix and later in this section. 

Later on, the notation will get a little complicated. Let us, then, first 
apply Theorem 13.61 to establish the following very simple special case of the 
counting lemma (it is, of course, the special case in which ^ consists of the 
single form ip\ (n) = n\). 

Lemma 3.7 (Irrational implies equidistributed). Suppose that (G/T,G 9 ) is 
a filtered nilmanifold of complexity at most M and that g : Z — » G is an 




V° 9\\c°°([N] D ) = Os^ii 1 )- 
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(A, N) -irrational polynomial sequence. Then we have the equidistribution 
property 

E ne[N] F(g(n)T) = [ F + Om(A~ Cm \\F\\u P ) 
Jg/v 

for all Lipschitz F : G/T — > C and some cm > 0. 

Proof. Suppose the conclusion is false. Then bjO Theorem l3.6l there is some 
continuous homomorphism n : G — > R which vanishes on [G, G] and T, has 
complexity Og(l), and for which \\r/ o g||c;°°[jv] ^ 5~°^ . Recall (cf. [301 
Definition 2.7]) what this means: in the Taylor expansion 

f] g(n) = a + ai (") H \-a s (") , 

the jth coefficient ay satisfies ||r/z ^ S~°^ /N^ for j = 1, . . . , s. If the 
sequence 5 is developed as a Taylor expansion 

( n ) ( n ) 
9(n)= 909i ■■■9s ) 

then we of course have ay = r/(gj)- Choose i maximal so that the restriction 
?7|g w is nontrivial. Then certainly ||f7(<7i)||]R/z ^ 5~°^ /N % . We claim that 
77 is an i-horizontal character in the sense of Definition IA.51 a statement 
which will clearly contradict the supposed (A, AQ-irrationality of g if 5 is 
a sufficiently small power of 1/A. To this end all we need do is confirm 
that rj vanishes on G( i+1 ), and on [Gq\, Gu_j\] for ^ j ^ i. The 
first of these follows from the maximality of i, whilst the second and third 
follow immediately from the properties of 77 stated at the beginning of the 
proof. □ 

Let us turn now to the more notationally intensive general case. Now, 
we apply Theorem 13.61 to G^/T* to conclude that there is a non-trivial 
continuous homomorphism 77 : G* — > R which maps to Z, has complexity 
05(1), and satisfies 

\\V° g*\\c~>({N]°) = O s (l). (3.4) 

Much as in the proof of Lemma 13.71 what this means is that if 77 o (n) is 
developed as a Taylor series in multi-binomial coefficients (") = ("*) . . . (™^) 

(see Lemma I A. lj) . the coefficient Oj satisfies 1 1 c*J 1 1 jr/z ^<5 N~^. Our aim is 
to use this information to contradict the assumption that g(n) is (A, N)- 
irrational. 

Let us once again take i maximal such that t/| g * is nontrivial. Consid- 
er 

ering again the Taylor expansion of g(n), we have 

(r,og*)(n)=J2v(g) J V--, g) J ')■ (3.5) 

j'=i 



^In fact here we only need the rather simpler 1-parameter version, which is [301 The- 
orem 1.16]. 
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Take the basis vi,V2, ■ ■ ■ for vpW described earlier. Then, since the vector 

((^ n )),...,(^f))) 

lies in vl/^l, there is an expansion 

((^j n) ),. • • , (^f } )) = Pi,l(nM + • • • + P^.(n)^. (3.6) 

for j = 1, . . . , i, where the Pj t f. : iP — > R are polynomials of degree at most 
j, recalling that mj := dim(\l/^). Comparing with (j3.5j) . we obtain 

i mj 

(r 7 o 5 *)(n) = ^^P ijfc (n)r 7 ( 5 f). (3.7) 

i=i fc=i 

We are going to look at the coefficients a\ of (|3.7|) for the monomial n 1 := 
n 1 ^ . . . , where i = (ii, . . . ,%d) and |i| := \h\ + ■ ■ ■ + \id\ = i- We are 
assuming that every such coefficient satisfies ||cti ||m/s ^6 Note also 

that 

mi 

«i = E(^)^r), (3.8) 

k=l 

where (Pjfc)i is the n 1 coefficient of Pjjt(n); this is because terms of total 
degree i cannot arise from the terms j = 1, . . . , i — 1 in the sum on the right 
hand side of (|3.7p . 

On the other hand by taking j = i in (|3.6|) we have 

(P^i (n))i«i H h (Pi, TOi (n))it? m< 

= . , 1 . , (^l(eir • • • Vi(eD) ic , • • • , ^(ei)* 1 ■ ■ ■ M^dY d ) 
ill... id' 

1 *(eir---*(e D r, (3.9) 



ii! . . . i D \ 



where ej = (0, . . . , 1, . . . , 0) € TP , the 1 being in the jth position, and 

Comparing (|3.8p and (|3.9p and using the fact that r\ is a homomorphism 
on G*, we obtain 

ill . . .id- 

Thus, for each i with |i| = \i±\ + • • • + \id\ = i, we have 

\\v(9f ieiyil ^ eD)tD )\\ m <<sN^ (3.10) 

To obtain the desired contradiction with the (A, A^)-irrationality hypothesis 
and thus complete the proof, it suffices (after taking A sufficiently large 
depending on 5) to establish that for at least one choice of i, the map £i : 
— > R defined by 

6(<?) :=r ? (/^) 11 -*^) lD ) 
is a nontrivial horizontal i-character of complexity 0,5(1). 
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The complexity bound follows from the fact that the coefficients of the 
forms ipi are integers of size O(l) and the Baker-Campbell-Hausdorff for- 
mula (Appendix [C]). That at least one of these maps is nontrivial follows 
from that fact that rj is nontrivial on G*^ and the fact that the vectors 

\I/(ei) n • • • ^(eo) lD , i\ + ••• + id = h span ( a consequence of Lemma 
1331). 

Furthermore £i always annihilates T*^ and G^ i+1 ^ (by the asserted maxi- 
mally of i). To qualify as an z-horizontal character we must also show that 
it vanishes on [G?^,GH -J for each ^ j ^ i. To this end, note that we 
may factor 

^(ei)* 1 ■■■^(e D y D = ww', 

where w G VP^ and w' G [*-.?], Indeed, we may take 

w = *(ei) J1 • • • $(e fl ) jD , w' = ^(ei) 11 ^' 1 • • • ^(e D ) tD - jD 

for any indices ji, ■ ■ ■ ,jD with j/ ^ i/ and j% + • • • + jo = j, whereupon 
the relevant containments follow from Lemma 13.31 Now if g G G*^ and 

g' G Gf-_~ are arbitrary then we have 

[g™,g' w '] = [g,gr w ' (mod G* +1) ) 

by the Baker-Campbell-Hausdorff formula (]C.2p . Applying r/, which is trivial 
on G* +1 ^ by assumption, we obtain 

^[g,g'}) = r ] ([g,gr w ')=r ] ([g w ,g' w '])=0, 

the last step being a consequence of the fact that r\ has abelian image and 
hence vanishes on [G*, G ]. This concludes the proof of the counting lemma, 
Theorem 11.111 

4. Generalised von Neumann type theorems 

In this section we recall a number of results asserting the connection 
between Gowers norms and various types of linear configuration. These 
results are collectively known in the literature as "generalised von Neumann 
theorems" . The connection between Gowers norms (not called by that name, 
of course) and linear configurations was first made in [I7j. A fairly general 
result of this type, which appears in [31J, is the following. 

Theorem 4.1 (Generalised von Neumann Theorem). Let Vl/ = (ifi\, . . . ,ipt) 
be a collection of linear forms tpi , . . . , tpt '■ ^ D — > ^ for some t, D ^ 1, any 
two of which are linearly independent. Then there exists an integer s = s(^f) 
with the property that one has the inequality 

t 

\ E rie[N]^T\fi(M n ))\ <-t,D,^ Jnf WfiWu^im (4- 1 ) 

1 ' -*■ ■*- l<i<m 
i=l 

for all N ^ 1 and all f\, . . . , f m : [N] — > C bounded in magnitude by 1. 
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Remarks. A natural value of s(^) comes from the proof in [31], which pro- 
ceeds via s applications of the Cauchy-Schwarz inequality. For this reason 
Gowers and Wolf [22J call s(^f) the Cauchy-Schwarz complexity of the system 
^. There is a linear- algebra recipe for computing s(^f) which is not especially 
enlightening but sufficiently simple that we can give it here (see the intro- 
duction to [31 j for more details). If 1 ^ i ^ t and s then we say that 
has i-complexity at most s if one can cover the t — 1 forms {tpj : j £ [t] \ {i}} 
by s+1 classes, such that ipi does not lie in the linear span of the forms in any 
one of these classes. Then s(^f) is the smallest s for which the system has i- 
complexity at most s for all 1 ^ i ^ t. Note, then, that the Cauchy-Schwarz 
complexity of the system ^ = {m, n\ + n 2 , . . . , n% + (k — 1)^2 } correspond- 
ing to a A:-term arithmetic progression is k — 2. As a final remark, let us 
note that Theorem I4.1|, as proved in [3TJ Appendix C], is regrettably some- 
what difficult to understand as we had to establish a more general result in 
which the functions fi were bounded by an arbitrary pseudorandom mea- 
sure, and this is notationally heavy. For a gentle explanation of the special 
case ^ = {n\,ni + n 2 ,n\ + 2n2,n\ + 3ri2} (where s = 2) the reader may 
consult [261 Proposition 1.11]. A sketch of the proof of Theorem 14.11 is also 
given in \22\ §2]. See also [5] for a variant of these notions of complexity in 
the ergodic setting, and for polynomial forms instead of linear ones. 

We will need a twisted version of the Generalised von Neumann inequality, 
in which an additional nilsequence of lower degree is inserted. We shall not 
need it for general linear forms, so we formulate just the special case we 
need. 

Lemma 4.2 (Twisted generalised von Neumann theorem). Let k ^ 3, let 
/o,...,/fc_i : [N] — > C be bounded in magnitude by 1, let cq, . . . ,Ck-i be 
distinct integers, and let F(g(n)T) be a degree ^ (k — 2) nilsequence of 
complexity at most M . Then 

fc-l 

„ in f , ll/<ll 

U S>- % S>- ft _1_ 

i=0 

Proof. We induct on k, starting with the case k = 3. The underlying nil- 
manifold G/T is then a torus (M/Z) m with m = Om{^)-, and g{n) = 9n + 0Q 
may be taken to be linear. By a standard Fourier decomposition we may 
assume that F(x) = e(£ • x) for some £ G 27™ with |£| = 0^(1), in which 
case we may rewrite the estimate to be proven as 

\^ne[N]^de[-N,N]fo(n + c d)f[(n + c x d)f 2 (n + c 2 d)\ < M ./ Jnf \\fi\\u 2 [N], 

where f[{n) = f\{n)e{-{oi-ci)~ x £'9n) <md f 2 {n) = f2{n)e((c 2 -c 1 )~ l £-6n). 
However it is easy to establish the invariance properties ||/i||;y2 = ||/i||c/ 2 
and ||/2||jy2 = H/^Ht/a, an d so the result follows immediately from Theorem 
I4TT1 
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Now suppose that k ^ 4 and the claim has already been proven for smaller 
k. By permuting indices and then translating n, it suffices to show that 

fc-i 

N] F(g(d)T) H fi (4.2) 

i=0 

under the assumption that Co = 0. 

Recall from |30j that we define a vertical character to be a continuous 
homomorphism £ : G( fc _ 2 )/(G( fc _ 2 ) H T) — ^ R/Z. We say that F has vertical 
frequency £ if one has F{g^_2x) = e(£(gk-2))F(x) for all x G G/T and 
5fc_2 6 G(k-2)- By a standard Fourier decomposition in the vertical direction 
(e.g. by arguing exactly as in [30, Lemma 3.7]) we may assume without loss 
of generality that F has a vertical frequency £. 

Applying the Cauchy-Schwarz inequality, we can bound the left-hand side 
of @2J by 

fe-i 

« \^mh4G[~N,N]F(9(d+h)F)FjgJd)T)ll f i {n+c i d+c i h)J^+c~d)\ 1 / 2 . 

i=0 

Because i 7 has a vertical frequency, F(g(d+h))F(g(d)F) is a degree ^ — 3) 
nilsequence of complexity Om,A:(1) ( see [2Q1 Proposition 7.2]). Applying the 
induction hypothesis, we may thus bound the above expression by 

< ^M,k,c ,...,c k _ 1 ( E /ie[-A r ,iV]ll^c i h/i||^fc-2[ A r]) 1 ^ 2 

which by Holder's inequality can be bounded by 

<M,fc,c ,...,c fc _ 1 [— | Ci | A^, | Ci | at] 1 1 ^/i /i 1 1 ^/fe - 2 [jv] ) 
and the claim follows from the recursive definition of the Gowers norms. □ 

Remark. The above argument is very similar to the short proof presented 
in |33|, Appendix G] that s-step nilsequences obstruct uniformity in the U s+l - 
norm (that is, the inverse conjecture GI(s) is an if-and-only if statement). 

5. On a conjecture of Bergelson, Host, and Kra 

We now apply the arithmetic regularity and counting lemmas to establish 
Theorem 11.121 the proof of the conjecture of Bergelson, Host and Kra. Our 
strategy here can be viewed as a finitary analogue of the ergodic theory 
arguments in [4], however there are some slight differences in our approach 
which we comment on at the end of this section. 

It will suffice to prove the following claim. 

Theorem 5.1. Let k = 1, 2, 3 or 4, and suppose that < a < 1 and e > 0. 

Then for any N ^ 1 and any subset A C [N] of density \A\ ^ aN, one can 
find a function fx : Z — > M + such that 



®de[-N,N]Kd) = 1 + O(e) 



(5.1) 
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and 

sup n(d) <^ a)£ 1 (5.2) 
de[-N,N] 

such that 

K ne[N] . dehNtN] l A (n)l A (n + d)... l A (n + (k— l)d)(x(d) ^ a k - O(e). (5.3) 

Indeed, from f)5. 1|) . ()5.3|) . we see that we have 

E ne[N] l A (n)l A (n + d)... l A (n + (k - l)d) > a k - 0(e) 

for all d in a subset E of [— N,N] with K d£ [_ N ^ N jlE(d) fi(d) ^> a ^ £ 1. (Here 
we crucially use the trivial but fundamental fact that 1 A is nonnegative.) 
From (15. 2D we conclude that |i£| 3>o j£ iV, and Theorem 11.121 follows (after 
shrinking e by an absolute constant). Conversely, it is not difficult to deduce 
Theorem 11.121 from Theorem 15.11 

It remains to establish Theorem 15.11 We may assume that N is large 
depending on a, e as the claim is trivial otherwise (just take fi to be the 
Kronecker delta function at 0). 

For k = 1 one can simply take fj, = l. For k = 2, we first observe that 

^ne[N]^he{-sN,sN]lA(n + h) = a + 0(e); 

applying Cauchy-Schwarz we conclude that 

^h,h'e[-eN,eN]^ne[N]^A(n + h)l A (n + h!) ^ a 2 - 0(e). 

The claim then follows, with [i being the probability density function of 
h — h' as h, h' range uniformly in [— eN, eN]. 

Now we turn to the cases k = 3,4. Here, one has to be more sophisticated 
about how one chooses [i (for instance, by using a Behrend set construction 
it is not hard to see that the previous choices of /i do not always work) . Let 
T : M + — > M + be a sufficiently rapidly growing function depending on a,e 
in a manner to be specified later. We apply Theorem 11.21 with s := k — 2 to 
obtain a quantity M = £ ^(\) and a decomposition 

1a(«) = /nii(n) + f sm \(n) + / UI1 f(n) (5.4) 

such that 

(i) fnii(n) is a (F(M), A r )-irrational degree ^ k — 2 virtual nilsequence 
of complexity at most M and scale N; 

(ii) / sm i has an L 2 [N] norm of at most e/100; 

(hi) /unf has an U k ~ 1 [N] norm of at most l/J r (M); 

(iv) /nil, /smi) /unf are all bounded in magnitude by 1; and 

(v) / nil and / nil + / sm i are non-negative. 

It is clear that |E ne [An/ S mi(ri)| = 0(e), and furthermore, by Theorem 14.11 
(setting all but one of the functions equal to 1) we also have \E, n ^ N ^f un { (n)\ = 
0(e) if T grows rapidly enough. Therefore 

E nG [7V]/nii(n) > a - 0(e). (5.5) 

The heart of the matter is the following proposition. 
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Proposition 5.2 (Bergelson-Host-Kra for / n u). Let k = 3,4. Then there 
exists a non-negative (k — 2) -step nilsequence fx : Z — )■ M + of complexity 
O a ,e Af(l) obeying the normalisation 

^de[N]Kd) = 1 + 0(e) (5.6) 

and smc/i that 

En,de[jv]/iui(n)/mi(n + d) • • • /nilfa + (fc - 1)<0m(<0 > " 0(e)- (5-7) 
Deduction of Theorem \5.1\ from Proposition 1 5. 6 A Using (15.41) . one can ex- 
pand the left-hand side of (j5.3j) into 3 fc terms, one of which is (|5.7|) . As 
for the other terms, any term involving at least one copy of / un f is of size 
O a>£) jvf(l/^"(M)) by Lemma B~2l and the U k ~ 1 norm bound on / un f. Finally, 
consider a term that involves at least one copy of / sm i. Suppose first that 
we have a term that involves / sm i(n). Then after performing the average 
in d using (15. 6|) . we see that this term is OQEne[An|/sml(ri)|)) which is 0(e) 
by the L 2 [N] bound on / sm i and the Cauchy-Schwarz inequality. Similarly 
for any term that involves f sm \(n + id), after making a change of variables 
{n' , d) := (n + id, d). Putting all this together we obtain the result. □ 

It remains, of course, to establish Proposition 15.21 We may assume that 
N is sufficiently large depending on a, e, M, as the claim is trivial otherwise 
by taking fx to be a delta function. 

We first establish the proposition in the easier of the two cases, namely 
the case k = 3. This was previously considered in [25]. In this case it is 
actually easier to work with the (easier) weak regularity lemma, Proposition 
12.71 m which the degree 1 polynomial sequence g(n) is not required to be 
irrational. Note that we have not made any use of irrationality so far, though 
we shall do so later when discussing the case k = 4. We may identify G/T 
with (M/Z) m for some m = Ojvf(l) and, by modulating F if necessary, we 
may suppose that g(n) = On is linear with no constant term, where 6 £ M m . 
Then 

f nil (n) = F(n9), 

where F : (M/Z) m — > C has Lipschitz norm Ojvf(l)- 

Let e' > be a small number depending on e and M to be chosen later, 
and let Bi,F>2 Q [-N, N] denote be the two Bohr sets 

B 1 := {d £ [-e'N,e'N] : dist (R/z)m (9d, 0) < e'} 

and 

B 2 := {d e [-e'N,e'N] : dist (K/z)m (0d, 0) ^ e'/2}. 

By the usual Dirichlet pigeonhole argument we see that I-B2I 3v m N. Also, 
from the Lipschitz nature of F, we see that 

/ ni i(n + d) = /nii(n) + Om{e') 

whenever d 6 B\ and n G [— (1 — e')N, (1 — s')N]. As a consequence, it 
follows that 

E n e[iV]/nii(w)/nii(n + d)f n u(n + 2d) = E„ eN / ni i(n) 3 + M (s') 
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for such d. However from (|5.5j) and Holder's inequality one has 

EneN/niiH 3 > a 3 - 0(e). (5.8) 

Proposition 15.21 (in the case k = 3) now follows by taking /i(d) = ap(9d), 
where ?/> : (R/Z) m — )• [0,1] is an Ojy/ j£ / (l)-Lipschitz function which is 1 on 
B2 and outside B±, c = Oyi e '(l) is a suitable normalisation constant, and 
by taking e' to be suitably small. 

It is important to note here that the error term 0(e) in (|5.8p is uniform 
in M, as otherwise the argument would not work (recall that M will depend 
on e). The dependence on M is instead manifested where it does not do 
significant damage to the argument, namely in the complexity of the weight 
[i. 

We now turn to the k = 4 case of Proposition 15.21 For simplicity, let us 
first consider the model case when f n n is a genuine nilsequence and not just 
a virtual nilsequence, that is to say 

/nil(n) = F(g(n)T) (5.9) 

where (G/T, G.) is a degree ^ 2 filtered nilmanifold of complexity Om(1), 
and g G poly(Z, G,) is (T(M) , i\T)-irrational. By Taylor expansion (see 
Appendix [A}, we have 

( n ) 

g(n) = go9i92 

for some go,gi G G and 52 G ^(2)- The (^(M), A^)-irrationality of g ensures 
certain irrationality properties on gi and g2, though we will not need these 
properties explicitly here, as we will only be using them through the counting 
lemma (Theorem II. lip , which we shall be using as a black box. 

Let 7r : G — > T\ be the projection homomorphism to the toruf^ T := 
G/(G (2) T). Then 

Tr(g(n)) = Tr(g )TT(gi) n . 
Let e' > be a small quantity depending on s, M to be chosen later. We set 

fi(d) := cl^ £ , Nt£/N] (d)(f)(Tr(gi) d ), 

where, much as in the analysis of the case k = 3, eft : T\ — > M + is a smooth 
non-negative cutoff to the ball of radius e' centered at the origin that is 
not identically zero, and c is a normalisation constant to be chosen shortly. 
From Theorem 1 1 . 1 1 1 one has 



Thus if we set 



N,e'N]4 l ('K(gi) d ) = / + Ojr( M w DO . 6 / M (l) +Ojv_^. oc . e / j jVf(l). 



c := = £l>M (l) (5.10) 

then we have the normalisation (|5.6p . if J- is sufficiently rapid, depending on 



the way in which e' depends on e,M, and iV is sufficiently large depending 



1 9 

Note this is not quite the same thing as the horizontal torus, which is so important 
in [30], which is (G/r) ab := G/[G, G]V. 
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on e, e', M. From the bound on c we see that /i is a degree ^ 1 (and hence 
also degree ^ 2) nilsequence of complexity £ i t MiX)- 

We now apply the counting lemma, Theorem 11.111 to conclude that 

^n,de[N]fm\(n)fni\{n + d)f ni i(n + 2d)/ nil (n + 3d)fi(d) 

= / F + Orl M ^ 00 . E i M (l) + OjV->oo;£',Af(l) 

(5.11) 

where G* C G 4 is the Leibman group associated to the collection ^ = 
(■00 j "01 ; "03) : — ^ Z 4 of linear forms ^(n) := n i + ^ n 2, « = 0,1,2,3, 
that is to say the Hall-Petresco group HP 4 (G), and F : G* — > C is the 
function 

F(x ,x 1 ,x 2 ,x 3 ) := C(f)(ir(xi)ir(xo)~ 1 )F(x )F(x 1 )F(x2)F(x 3 ,) 

(here we use the identity it(g(n+d))~ l ir(g(n)) = ir(gi) d , immediately verified 
from the Taylor expansion). 

We now do some calculations in the Hall-Petrseco group very similar to 
those in [4]. We saw in $3] that 

G* = {(go, gogi, 909192, gogfgi) ■ go,g\^G,g 2 e G (2) } 

(note, of course, that G(3) = id in the case we are considering). For our 
calculations it is convenient to use the following obviously equivalent repre- 
sentation: 

G = {(5052,0,505152,1)505152,2,905152,3) : 5o,5i G G; 

52,0, • • • ,52,3 G G(2)5 52,052^152,252^3 = id }' 

Here we have taken note of the fact that 

\I>[ 2 ] = {(xo,xi,X2,x 3 ) £ M 4 : x - 3xi + 3x 3 — x s = 0}. 

This last equation is quite special in that it exhibits a certain "positivity" , 
as we shall see later; this is key to our argument. The lattice T* can be 
similarly described by requiring 5o, 5i, 52,0, 52,3 to also lie in T. As a 
consequence of this, an arbitrary point of the nilmanifold G^/r* can be 
parameterised uniquely as 

(5o52,o, 5o5i52,i, 5o5i52,2, 5o5i52,3)r* (5.12) 

where 50,51 lie in a fundamental domain Si C G of the horizontal torus T\ 
(i.e. a smooth manifold with boundary on which tt is a bijection from Si to 
Ti), and 32,0, • • • ,52,3 he in a fundamental domain S2 C G( 2 ) of the vertical 
torus T2 := G( 2 )/T(2) subject to the constraint g2,og 2 i9 22 9 2 \ G T( 2 ). For 
such a point (|5.12p . the function F takes the value 

3 

c4>^{9i))X\F(g Q glg 2tj T). 
j=0 
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On the support of <fi, g\ is a distance Om{^') from the identity (if the fun- 
damental domain Si was chosen in a suitably smooth fashion), and so by 
the Lipschitz nature of F and the boundedness of go we have 

*W<72,i) = F(g g 2J T) + M {e'). 

As a consequence, the integral J G */ r * F can be expressed as 

3 

c/ / <PW9i)){ [ 92 92 S£T2 f[F(g g 2!j r) + O M (e')) (5.13) 

where all integrals are with respect to Haar measure. 

Let £ G T2 be a vertical character, i.e. a continuous homomorphism from 
T2 to M/Z. For any x E G/r, we can define the vertical Fourier transform 
F(x,£) to be the quantity 



F(x,0 := I e{-i{g 2 ))F{g 2 x). 



From the Fourier inversion formula we have 

/ S2 ,o,.. jS2 ,36t 2 n F W) = E i^(9o,e)i 2 i^(<?o,3e)i 2 . 

92,09^?92,2^3= id: ' =0 5ef 2 

In particular, we hava^l 

3 

's 2 ,os 2 ,?sl, 2 s 2 ,3= id -?' =0 
Inserting this bound and (|5.10j) into (I5.13h . we conclude that 

/ f> ( |%r,o)| 4 -o M ( e ')-o J(JWiM (i). 

JG*/r* J90&1 
From Fubini's theorem we have 



/ 



/ F(g T,0)= [ F 



/<?oe£i JG/r 
and from Theorem II. Ill (|5,9p and (|5.5p we have 



/ = a + 0(e) + Ojr(M)->oo; £ ' ,m(1) + OAT-Kx>;e',M(l)- 



'G/r 

Applying Holder's inequality, we conclude that 



/ F ^ a 4 - O(e) - C»Af(e') - o T{M) ^ , M {1) - ojv-xx^' ,Af(l), 
iG*/r* 



^This is the "positivity" alluded to earlier. The argument is essentially that used in 
[3] and it is special to the k = 4 case, which is of course consistent with the failure of 
Theorem 15. II to extend to k ^ 5. 
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and so (|5.7|) follows from (I5.11|) . if e' is sufficiently small depending on e, M, 
T is sufficiently rapid depending on e, and N is sufficiently large depending 
on e',M. 

This concludes the proof of the k = 4 case of Proposition 15. 21 in the special 
case when / n n(n) = F(g(n)T) with g irrational. Unfortunately Theorem 
If .21 requires us to deal with the somewhat more general setting of virtual 
nilsequences, in which there is dependence on n mod q or n/N. The extra 
details required are fairly routine but notationally irritating. Let us now 
suppose, then, that 

/nil(n) = F(g(n)T, n mod q, n/N). (5.14) 

We let e' be as before, but modify fi to now be given by 

H(d) := ql q \dcl[-e'N,e'N\{d)4>{^{g\) d ), 

with c still chosen by f|5. lOj) . As before, one can use Theorem II. Ill to estab- 
lish (^61) . 

Now consider the left-hand side of the expression (j5.7|) we are to bound 
in Proposition 15. 2\ that is to say 

^n,de[N]fml(n)fnii(n + d)f nil (n + 2d)f nil (n + 3d)fi(d). (5.15) 

Splitting into residue classes modulo q, we can express this as 

3 

cE re[q] E ne[N/q] E de[ _ £ , N/qt£ , N/g] Yl F(g(qn + qid + r)Y, r, 

q(n + ir)/N)^( gi f d ) + Ojv->oo ;£ ',m(1) • 

We partition [N/q] into intervals P of length \ L e'N\ (plus a remainder of 
cardinality 0{e'N)). We can then rewrite the above expression as 

3 

cEpE r&[q] E neP E de[ _ £ , N/q ^ N/q] Y[ F{g{qn + qid + r)T, r, 

i=0 

q(n + ir)/N)<f>{Tt{g x ) qd ) + O(e') + 0^^^)- 

For each such expression, we can use the Lipschitz nature of F to replace 
q(n + ir)/N by qnp/N, where rip is an arbitrary element of P, losing only 
an error of Om{£')- The above expression thus becomes 

3 

cE P E re[q] E neP E de[ _ £ , N/q)£ , N/q] F(g(qn + qid + r)T, r, qn P /N)(f)(7r(gi) qd ) 

+ Om(z') + OaT->oo;£',m(1)- 

Because the orbit n \— > g(n)T is (T{M\ iV)-irrational, we see from Lemma 
IA.8I that shifted translate n (->■ ^(g(n + n P ) + r)T is (> M T(M),N)- 
irrational. We may then argue as in the previous case and bound the above 
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average below by 

>E P E re[ ,]| / F(;r,qn P /N)\ 4 - 0(e) - Om(s') 
Jg/t 

- °.F(A'f )— >oo;e',M(l) - OJV-Kx>;e',M(l). 

Using Theorem 11.111 again . we have 

E„ 6 p/nii(gn + r) = \ F(-,r, qn P /N) + o J r (M) ^ 00;£ , )M (l) + ojv-Kx>;e',Af (1) 
Jg/t 

and so (|5. 15[) is at least 

> E P E reM |E ne p/ nil (gn + r)| 4 - 0(e) - M (e') 

~ °J'(A/)^oo;£',A//( 1 ) ~ <>N-+aa;e',M{\)- 

Now from (|5.5]1 and double-counting one has 

EpE re[q] E ne pf ni i(qn + r) = a + 0(e) 
and so, from Holder's inequality, we deduce that (|5. 15[) is 

^ a 4 - O(e) - M {e') ~ OjF(MHoo;e',Af(l) ~ OjV-k»;e',Af(l)- 

Proposition 15.21 now follows by once again choosing e' small enough depend- 
ing on e, M , and choosing T rapid enough depending on e, and N sufficiently 
large depending on e,e' , M. 

Remark. Our arguments are similar to, but slightly different from, the 
ergodic theory arguments in [4] . However it is likely that the argument in [4] 
can be translated to a finitary setting; we sketch how this would proceed as 
follows, restricting attention to the k = 4 case for concreteness. The goal is 
to obtain a lower bound E ne rj\n/(n)/(n + d)f(n + 2d)f(n + 3d) ^ a 4 — O(e) 
for some positive density set of values of d. The analogue to the argument 
in [4] would proceed by performing the regularity lemma decomposition at 
step s = 3 rather than s = 2, so that the error / un f is tiny in the U 4 norm 
and not just the II s norm. Prom this and Theorem 14.11 ° ne can show that 

E de[i v]|E ne[A r]/i(n)/2(n + d)f 3 (n + 2d)/ 4 (n + 3d)| 2 

is tiny whenever at least one of /i,/2,/3,/4 is equal to / un f. As a conse- 
quence, ^ n £[N]fi(n)f2(n+d)f3(n+2d)f4(n+3d) is negligible for almost all d. 
We can thus ignore the contribution of / un f- The remainder of the argument 
proceeds along similar lines as above, but at one higher step (though the 
3-step nilsequences involved can quickly be reduced to 2-step nilsequences, 
cf. [U Section 8.1] or Section [7] below). 

One of the innovations in this paper is to introduce weights such as fJ,(d), 
controlling the double average E n d€ [ N ^f(n)f(n + d)f(n + 2d)f(n + 3d)fj,(d) 
rather than controlling the single average E ne nyi/(n)/(n + d)/(n + 2d)/(n + 
3d) for many values d. Thanks to the twisted generalised von Neumann 
theorem (Lemma I4.2H . the "complexity" of such double averages is slightly 
less than that of the single averages, and in particular our proof of Theorem 
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11.121 requires only the inverse U 3 theorem from [28] rather than the more 
difficult inverse U 4 theorem from [33J. 

6. Proof of Szemeredi's theorem 

We turn now to the proof of Szemeredi's theorem. We deemed this result 
too famous to state in the introduction but, for the sake of fixing notation, 
we recall it here now. It is most natural to establish what might be called 
the "functional" form of the theorem which is a priori a stronger statement 
(though quite easily shown to be equivalent to the standard formulation by 
an argument of Varnavides [57] ) . 

Theorem 6.1 (Szemeredi's theorem). Let < a ^ 1, let k ^ 3, and let 

N ^ 1. If f : [N] —> [0, 1] is a function with E ne [jv]/(n) ^ a then 

A fc (/,/,...,/)» M l, 

where 

Afc(/i, • • • , fk) ■= ^ne[N];de[-N,N]fi(n)f2(n + d)... f k (n + (k - l)d) 
is the multilinear operator counting arithmetic progressions. 

We now prove this theorem. We fix k, a, and allow implied constants to 
depend on these quantities. Our argument has some similarities with the er- 
godic theory proof of (a polynomial generalisation of) Szemeredi's theorem 
in [6], in particular in first reducing the problem to a problem concerning 
nilsystems, which one then solves by the equidistribution theory of such sys- 
tems. However, one of the key steps in [6], in which one shows that multiple 
recurrence is preserved under inverse limits, is more difficult to replicate 
in the Unitary setting than in the ergodic one (see |50j). Our argument 
thus differs somewhat from [6] , most notably by inserting a carefully chosen 
weight fj,(n, d) before proceeding. 

As usual, we begin by applying the regularity lemma, Theorem 11.21 In 
view of the generalised von Neumann theorem, Theorem 14.11 it is natural to 
apply this theorem with s = k — 2 (which, as remarked in is the Cauchy- 
Schwarz complexity s = s(^f) of the system ^ of linear forms ni,m + 
rt2, ■ ■ ■ , n\ + (k — 1)^2). If we do so, with a small parameter e > depending 
on a, k to be chosen later, and a growth function T depending on a, k, e to 
be specified later, we obtain a decomposition 

f(n) = /rul(n) + jsmi(n) + /unf(ra) (6.1) 

where 

(i) /nil is a (T(M) , iV)-irrational degree ^ k — 2 virtual nilsequence of 
complexity ^ M and scale N; 

(ii) /smi has an L 2 [N] norm of at most e; 

(hi) /unf has an U k ~ 1 [N] norm of at most l/J r (M); 

(iv) /mi, /smi) /unf are all bounded in magnitude by 1; and 

(v) / ni i and / nn + / sm i are non-negative. 
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As we shall soon see, the contribution of / un f can be quickly discarded 
using the generalised von Neumann theorem. If one could also easily discard 
the contribution of the small term / sm i, then matters would simply reduce 
to verifying that the contribution of / n n is bounded away from zero, which 
would be an easy consequence of the counting lemma. Unfortunately the 
small term / sm i is only moderately small (of size 0{e)) rather than incredibly 
small (e.g. of size 0(1/J-"(M))), and so one has to take a certain amount of 
care in dealing with this term, which makes the analysis significantly more 
delicat<S 

We turn to the details. Much as the key to proving Theorem 11.121 was to 
establish Proposition 15.21 the key to establishing Szemeredi's theorem is the 
following proposition. 

Proposition 6.2 (Szemeredi for / n a). Let f n n be as above, and let e > 0. 
Then there exists a function /i : Z x Z — > M + supported on the set 

{(n, d)eZxZ:rf£ [-eN, eN};n + id£ [N] for all i = 0, . . . ,k - 1} (6.2) 

with 

E ne[N];de[-eN,sN}Kn, d) = 1 + O(e) (6.3) 
and with \x bounded in magnitude by Om j£ (1), such that 

f^n + id) = /„ii(n) + 0(e) (6.4) 

whenever ^ i ^ k — 1 and fi(n, d) ^ 0, and such that one has the equidis- 
tribution property 

^ne[N]\^de[-eN,eN]K n ~ id ,d)\ 2 = 1 + 0{e) (6.5) 

for allO s^i^k- 1. 

The crucial feature of Proposition 16.21 for us is that, with the exception 
of the uniform bound on /u, the error terms here decay as e — > 0, even if the 
complexity bound M on f n u is extremely large compared to 1/e. 

The reader may benefit from a few words about the role of the function 
[i. Supposing that fm\{n) = F(g(n)T) is a genuine nilsequence, this function 
acts like a kind of "weight" on progressions (n, n + d, . . . , n + {k— l)d) which 
are "almost diagonal" in the sense that g(n)T ~ ■ ■ ■ ~ g{n + (k — l)d)T 
in G/T. The condition (|6.5p reflects the fact that the weighted number of 
almost diagonal progressions whose ith point is n is roughly independent 
of n. This "non-concentration" of almost diagonal progressions ultimately 
means that the error / sm i cannot destroy too many of these progressions, a 
fact that is crucial for our argument. 

14 In the language of ergodic theory, the problem here is that the characteristic factor 
is not necessarily a nilsystem, but may merely be a pro-nilsystem - an inverse limit of 
nilsystems. A short, but not entirely trivial, argument of Furstenberg |TT] shows that 
multiple recurrence is preserved under inverse limits. This argument was adapted with 
some difficulty to the finitary setting in [50] ; our approach here is different and exploits 
some additional equidistribution properties of nilsystems, as well as using a carefully 
chosen weight /j,(n,d). 
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Let us assume Proposition 16.21 for now and see how it implies Theorem 
16. 1L We use (16. ip to expand out the form (/,... , /) into 3 fc terms. By 
Theorem 14.11 any term that involves / un f will be of size 0(1/ J- (M)), thus 

A k (f, ...,/) = AfcC/aa + / smb . . . , / ni i + / sml ) + 0(1/ F(M)). (6.6) 

Next, we use the weight /i arising from Proposition 16.21 and the non-negativ- 
ity of / n u + /smi guaranteed by Theorem 11.21 to write 

Afc(/nil ~\~ /sml i ■ ■ ■ > /nil /sml) 

>M, £ E n6 [JV];de[-eJV,eJV](/nil + /sml)(«) • • • (/nil + /sml)(« + (k ~ l)d)^(n, d). 

We then expand this latter average into the sum of 2 k terms. The main 
term is 

K ne[N];de[-sN,sN]fmi(n) ■ ■ ■ /nii(n + (k- l)d)fi(n, d), (6.7) 
and the other terms are error terms, involving at least one factor of / sm l- 

Consider one of the error terms, involving the factor f S ml( n + id) ( sa y) for 
some ^ i ^ k — 1. We can bound the contribution of this term by 

K ne[N];de[-eN,eN]\fsml(n + id)\fi(n, d), 

which by a change of variables n^n — idwe can write as 

^ne[N]\fsrai(n)\E de[ _ £N:£N] n(n - id, d). 

By Cauchy-Schwarz, (|6.5p . and the L 2 [N] bound on / sm i, this is 0(e). 
Finally, we look at the main term (|6.7p . Using (|6.4p we can approximate 

/nii(n) • • • / ni i(n + (k - l)d) = f nil (n) k + 0(e) 

and so (using (|6.3p ) we can write (|6.7p as 

^ne[N]fmi(n) k M de[ _ eN>eN] ij,(n,d) + 0(e). 

Now, from (16. 3D one has 

En e [Ar]Ede[-eAT,eAr]M'M) = 1 + °( £ ) 

and hence by (|6.5p 

^ne[N]\^de[-eN,eN]K n , d ) ~ l ? = °( £ )- 

In particular, by Chebyshev's inequality, we have 

®d£l-sN,sN]Kn,d) = 1 + 0(e 1/3 ) 

for all n G E, where £ C [N] has cardinality |S| > (1 - 0(e 1/3 ))iV. Thus, 
for e small enough, we can bound (|6,7p from below by 

»E ne[ ^ ] l i; (n)/„ii(n) fc -0( e 1 / 3 ). 

Now from hypothesis we have E n e[jv]/( n ) ^* 1- From Cauchy-Schwarz we 
have 

^ne[N]fsm\(n) = O(e), 
and from Theorem 14.11 we also have 

E„ e [jv]/unf(n) = O(e) 
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if T is rapid enough. Thus if e is small enough we have E n e[An/nii( n ) ^ 1> 
which implies that E ng [Ar]l_B(n)/ n o(n) S> 1, and hence by Holder's inequality 
that E ng [jv]l^(n)/^ il (n) 3> 1. Putting all this together, we conclude that 
(|6.7p is » 1 if e is small enough, and thus 

Afe(/nil + /smb • • • j /nil + /sml) >M,e 1- 



Inserting this bound into (|6.6p we obtain the claim, completing the proof of 
Szemeredi's theorem, if T is chosen sufficiently rapid. 

Proof of Proposition ^. 21 Let us first establish this in the easy case k = 3. 
In this case, / n ;i is essentially quasiper iodic, which will allow us to take 
fj,(n, d) to be of the form 

fJ,(n,d) = l[2eN,(l-2e)N}(n)Kd) 
with [i(d) normalised by requiring 

^de[-eN,eN]K d ) = 1 + °( £ )- 



It is then easy to verify that both (|6.3p and (|6.5p follow from this normal- 
isation. To establish the remaining claims in Proposition 16.21 we use the 
degree ^ 1 nature of the orbit n h-> g(n)T as in Section [5] to write / n o as 

/ ni i(n) = F{nO) 

for some 9 G (M/Z) D with D = M (1) and some F : (R/Z) D -> C of 
Lipschitz constant Om(1)- If one then sets /U to equal 

:= rgj 1b (a) 

where -B is the Bohr set 

{de [-eN,eN] :d (K/z)O (d0,O) ^5} 

and 5 > is sufficiently small depending on e, M, one easily verifies all the 
required claims. 

We now turn to the case k > 3, which is harder because / n o is no longer 
quasiperiodic, and so n(n, d) will have to depend more heavily on n and not 
just on d. By arguing as in the previous section we can normalise g(0) to 
equal id. We may also assume N is sufficiently large depending on e, M, 
since otherwise we may simply take fi(n,d) = lr^n (n)5o(d) where 5o is the 
Kronecker delta function at 0. We may of course also assume that e < 1. 

We take an OM(l)-rational Mal'cev basis X\, . . . ,^Qim(G) f° r the Lie al- 
gebra q = log G adapted to the filtration G, as described in [301 Appendix 
A]. For any radius r > 0, we define the "ball" B r in G to be the set of all 
group elements of the form 

dim(G) 

exp( J2 h X i) ( 6 - 8 ) 
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where the tj are real numbers with tj r s + 1 ~ l whenever 1 ^ i ^ s and j 
dim(G)— dim(G(j)). Thus, when r is small, l? r is quite "narrow" (of diameter 
comparable to r s ) when projected down to G/Gm, but is relatively large 
when restricted to the top order component Gr s \ (of diameter comparable to 
r). This type of eccentricity is necessary in order to make B r approximately 
"normal" with respect to conjugations. Indeed, we have 

Lemma 6.3 (Approximate normality). Let A, 5 > 0, and let g £ G be such 
that do(g,id) ^ A. Then we have the containments 

B^sy C gBr-g' 1 C B(i +6)r . (6.9) 
whenever r > is sufficiently small depending on A,8,M. 

Proof. We prove the second inclusion only, as the first is similar (and can 
also be deduced from the second). The conjugation action h h-> ghg^ 1 on G 
induces a Lie algebra automorphism exp(ad(log g)) : g — > g. If we conjugate 
the group element (|6.8p by g, we thus obtain 

dim(G) 

exp( ^2 ^ ex P( ad ( 1 °g5 , ))(^j))- 

3=1 

But if 1 ^ i ^ s and j dim(G) — dim(G(j)), we see from the Baker- 
Campbell-Hausdorff formula (|C.2p that 

dim(G) 

exp(ad(log#))(Xj) = Xj + ^ c j,j' x j' 

j'=dim(G)-dim(G (l) )+l 

for some coefficients Cjj> of size Oa,m{i~ s+1 Collecting all the coefficients 
together, we obtain the claim for r small enough. □ 

Let < 5 < 1/10 be a small quantity (depending on e, M), let R be a 
large quantity depending on the same parameters, and let ro > be an even 
smalleiEl quantity than 5 (depending on e, M, 5, R) to be chosen later. For 
each r with < r < ro take a Lipschitz function <j) r : G — > M + of Lipschitz 
norm Om,t,s{}) which is supported on B r and equals one on Bngy, and 
choose these functions so that <p r ^ <f>' r pointwise whenever < r < r' < ro. 
For each such r, let & r : G/T x G/T — > R + be the induced function 

g&G.gx=x' 

This function <3? r is supported near the diagonal of G/T x G/T; indeed, 
& r (x,x') is only non-zero when x' £ B r x, and furthermore if x' £ Bn$yx 



Readers may find it helpful to keep the hierarchy of scales 

1 ~ 1/k, a > e > l/M > 5 > 1/7? > r > r > l/J r (Af) > 1/iV > 

in mind. 
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then $ r (ar, x') = 1. If rQ is chosen sufficiently small depending on M, 8, we 
conclude from Lemma 16.31 that we have the approximate shift-invariance 

<S> {1 _ 3S)r (x,x') ^ <5> r (gx,gx') < $ (1+35)r (x, x') (6.10) 

whenever x,x' G G/T and g S G is such that dc(g, id) ^ i? (say). 
We now define our cutoff function fi = fi r by 

fe-i 

Hr(n,d) := c r l ? | d l [ i. £iVi ( 1 _ te ) A r](n)l[_ ( 5Ar )( 5JV](rf) JJ $ r (#(n)r, #(n + «f)r), 

i=l 

(6.11) 

where c r > is a normalisation constant to be chosen later. This function, 
as discussed immediately following the statement of Proposition 16.21 is a 
smooth cutoff to the set of "almost-diagonal" progressions in G/T. Specif- 
ically, /i r is supported in (|6.2p . and also in the region where g(n + id)T 6 
B r g(n)T, \d\ ^ 5N, and g|d for i = 0, . . . , k — 1. From the Lipschitz nature 
of F we thus have 

-F(#(ra + id)T, (n + id) (mod q),(n + id)/N) 

= F{g{n)T, n(mod q),n/N) + O M (r ) 

for (n, d) in the support of /U r , which gives (|6.4p for /i r if ro is sufficiently 
small depending on e, M. 

Next, we compute the expectation of fi r (n, d), in order to work out what 
the normalisation constant c r should be. Observe that 

^n£[N],de[-eN,eN]^r{n, d) 

= — (1 + 0(s))c r x (6.12) 
qe 

X^ne[keN,(l-ke)N];de[-8N,SN];q\d^r(g(n)T, ...,g(n+(k~ l)d)F), 

where $ r : (G/T) k M + is the function 

fc-i 

3v(cc , • • .,Xd_i) := ]^[ ^ r (x ,Xj). (6.13) 
i=i 

Observe that $ has a Lipschitz norm of OM r g(l). Applying Theorem 1 1.1 11 
we can express ()6. 12|) as 

S f 

— (1 + 0(e))c r ( / <E> r + Oj- (M) _> M 5 (1) + OAT^oo-Mr^l)), 

JG*/r* 

where C G fc is the k th Hall-Petresco group, that is to say the Leibman 
group associated to the collection ^ = (ipQ, . . . , ipk-i) °f linear forms V&W := 
(n, i->- n + id for « = 0, . . . , k — 1. 

The group is a Om (Irrational subgroup of G k , which itself has com- 
plexity Ojv/(1). Meanwhile, the function <E> r equals 1 on a ball of radius 
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r o A f(i) cen t r ed at the identity, and is bounded by 1 throughout. We con- 
clude that the quantity 



v r := / <J> r 

iG*/r* 

obeys the bounds 

r o M [i) <A/ Vr ^ L 

Furthermore, from the properties of the functions <j) r , we have the mono- 
tonicity property 

V(i-S) r ^ v r 

for any < r < r$. Applying the pigeonhole principle (using the fact that 
polynomial growth is always slower than exponential growth), and choosing 
<5 ^>e,M 1 sufficiently small depending on e, M, one can thus find a radius 

r > r ^> ro ,e,S,M 1 

such that we have the regularity property 

(1 - 0{e))v r < v {1 _ 3S)r ^ v {1+3S)r < (1 + 0{e))v r . (6.14) 

Note that this idea of picking a "regular" radius originates, in additive com- 
binatorics, in Bourgain's paper [8]. Fix from now on a value of r with this 
property. If we then set 

c r := f- (6.15) 

ov r 

we conclude that 

C r <M,r ,£ 1 (6-16) 

and 

^ne[N],de[-sN,sN]Vr(n, d) = 1 + 0(e) + O^ M )^oo;M,E,r (X) + °N^oo;M,e,r (l) ■ 

This will give (|6.3p provided that ro is chosen to depend on M, e, 5, that J 7 
is sufficiently rapid depending on s, and N is sufficiently large depending on 
M,e. 

Our remaining task, and the most difficult one, is to study the expression 
in (|6.5p . That is to say, we fix ^ i ^ fc - 1 and consider 

^■n€[N]\^d£[-eN,eN]Vr(n ~ id, d)\ 2 . (6.17) 

Using (|6.1ip . we can write this expression as 

(1 + 0(e))( — C r ) 2 ¥. n& y keN y (i-ke)N\^d,d' e[-5N ,SN]-,q\d,d' 

$f 2 {g(n - id)T, . . . , g(n + (k - 1 - i)d)T, 

g(n - id')T, . . . , g(n + (k - 1 - i)d')T) 

where §f 2 : (G/T) k x (G/F) k ->■ R+ is the tensor square 

$f 2 {x,x') := ® r (x)$ r (x'). 
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Applying Theorem 1 1.1 1[ we can thus express (I6.17P as 

(1 + '(e))(- :C r ) 2 ( / $f 2 +0^(MHoo; £ ,M,r (l)+0^oo; e ,M,ro(l)) 

(6-18) 

where G* 1 C G 2fc is the Leibman group associated to the collection 

■= {lfo,i, • • • , ll>k-l,i,tf ,i, ^fc-l,i) 

of linear forms 

: (n, d, d') i-)- n + (j - i)d 

and 

i// jti : (n, d, d!) ^ n + (j - i)d! 

for j = 0, . . . , k — 1. 

We will be establishing the following claim. 

Claim 6.4 (Approximate factorisation). We have 

[ &f* = (1 + Q-OOK 2 - (6-19) 

Proof of Proposition \6.2\ assuming Claim \6.4\ Substitute back into (|6.18p 
and use (j6. 15[) . ()6.16|) to conclude that 

(|6.17[) = 1 + 0(e) + OjT( M )-».oo;£,M,ro(l) + OjV->oo;e,Af,ro(l)' 

This gives the result upon choosing rg sufficiently small depending on e, M, 5, 
J- sufficiently rapid depending on e, and N sufficiently large depending on 
e,M. 

It remains to establish Claim 16.41 For notational simplicity we estab- 
lish only the claim i = (the others being very similar). The intuition 
behind this claim (and behind the key assertion that the number of almost- 
diagonal progressions whose i th term is n does not depend on n) is that the 
linear forms (VfyOi • • • > ^fc-i.o) an d (V'd o> ■ • • > V^-i o) are al mos t independent 
of each other, except for the fact that they are coupled via the obvious 
identity Vo,o = V'd.o- 

One way to encode this formally is to note that the Leibman group G* ° 3 
is given by 

H := {(x, x') G G* x G* : xq = x' }, 

a product of two copies of the Hall-Petresco group G* = HP fc (G) fibred over 
the zeroth coordinate. To prove this, one may note that the containment 
q\$ 0) ^_ ^ ^ v j ous _ Q n other hand, one may compute directly using 
the dimension formula (|3.ip that 

fc-2 
1=1 



dim(G*) = dim(G) + ^dim(G (i 
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and 

fc-2 

dim(G* (0) ) = dim(G) + 2^dim(G«) 

i=l 

and thus 

dim(G* (0) ) = 2dim(G*) - dim(G) = dim(H), 

and so since both sides are connected, simply-connected nilpotent Lie groups 
(and so both are homeomorphic to their Lie algebras) we have G* { ) = H. 
Write J r for the integral appearing in (|6,19p . that is to say 

J r := [ <Jf 2 (x,x'). 

J(x,x')GG*/r*xG*/r*:x =a;[ ) 

Let R be some quantity, and suppose that distaff, id) ^ R. Then by the 
almost-invariance property (|6.1(jp we have 

^W , l+3<5)( X ' x )^ Jr- 
(x,x')eG <B /r*xG*/r <s >:x =gx' y ' 

Integrate this over the ball Br := {g G G : distaff, id) ^ R}. Then we 
obtain 

/ X(x,x')m l l+3S) (x,x>) > vo\(B R )J r , 

where X(x,x') is the number of g 6 Br for which xq = gx^mod T), or 
equivalently 

X(x, x') := |T n Xq 1 Brx'q\. 

Choose representatives xq,x'q in some fundamental domain with xo,x' Q = 
Om(1). By a volume-packing argument and simple geometry we then have 

X(x,x') =vol(B R )(l + 0^00^(1)). 

Comparing with the above we have 

^(1-35) = / ®?L 3S ) > J rH + OR^oo;M (1)), 

and so by (|6.14p we have 

J r < (1 + 0(e) + o^oo ; m(1))^. 

This gives the upper bound for Claim 16.41 The lower bound is proven 
similarly. This concludes the proof of Proposition 16.21 and thus Theorem 
EH 
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7. ON A THEOREM OF GOWERS AND WOLF 

Our aim in this section is to prove Theorem 11.131 whose statement we 
recall now. 

Theorem 7.1 (Theorem ll.l3p . Let = (t/jx, . . . , ipt) be a collection of linear 
forms ipi, . . . ,Tpt '■ ^ D ~ * an d let s ^ 1 be an integer such that the 
polynomials tp{ +1 , . . . , ip% +1 are linearly independent. Then for any function 
f : [N] —7- C bounded in magnitude by 1 (and defined to be zero outside of 
[N]) obeying the bound ||/||[/-«+inv] ^ $ f or some 5 > 0, one has 

t 

8=1 

Henceforth we allow all implied constants to depend on d, t, s, without 
indicating this explicitly. Let s' = s'(*$>) be the Cauchy-Schwarz complexity 
of the linear forms ^f, as defined in Theorem 14.11 We may of course assume 
that s' > s, as Theorem 11.131 is immediate otherwise. We may also assume 
that N is large depending on 5, since otherwise the claim is trivial from a 
compactness argument. 

Let e > be a small number depending on 5 to be chosen later, and 
let J 7 be a growth function depending on e to be chosen later. Applying 
Theorem 11.21 at degree s' (after first decomposing / as a linear combination 
of 0(1) functions taking values in [0, 1]), we can find a positive quantity 
M = £) f(1) and a decomposition 

/ = /nil + /sml + /unf (7.1) 

where: 

(i) /nil is a AQ-irrational virtual nilsequence of degree ^ s', 
complexity M, and scale N; 

(ii) / sm i has L 2 [N] norm at most e; 
(in) / unf has U S ' +1 [N] at most 1/T(M); 

(iv) All functions f n n, / sm i, /unf are bounded in magnitude by O(l). 
We apply this decomposition to split the expression 

t 

i=\ 

as the sum of 3* terms, in which each copy of / has been replaced with either 

/nil? /smh Or /unf- 

Any term involving at least one factor of / sm i can be easily seen to be of 
size 0(e) by crudely estimating all other factors by 1. By (14. lj) . any term 
involving at least one factor of / un f is of size 0(1/ J-(M)), which is also of 
size 0{e) if T is chosen to be sufficiently rapidly growing depending on e. 
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We can therefore express (|7.2p as 

t 

Ke[N]oHfnii(An) + 0(e). 
i=i 

By hypothesis, we can write 

/nii(n) = F(g(n)T,n(mod q),n/N) 

for some q with 1 ^ q ^ M, some degree ^ s, (J-(M), iV)-irrational, orbit 
n i—)- g(n)r of complexity M and some Lipschitz function F : G/TxZ/gZx 
K of norm at most M. The mod q and Archimedean behaviour in / n Q are 
nothing more than technical annoyances, and we set about eliminating them 
now. We encourage the reader to work through the heart of the argument, 
starting at (I7.3P below, in the model case f a \\ = i ? (o(n)r). Let e' be a 
small quantity depending on e, M to be chosen lateio. We partition [N] 
into progressions P of spacing q and length e'N, plus a remainder set of size 
at most Ojvf(l). We can then rewrite the above expression as 

t 

IE P 1 , . . . , P D IE n 6 p x x ■ ■ ■ x P D 

8=1 

We abbreviate Pi x . . . x Pjj as P. For a given P, observe that as n ranges in 
P, the residue class of ipi(n) modulo q is equal to a fixed class ap.j, and the 
value of ifii(P)/N differs by at most Om(e') from a fixed number xp^. We 
may assume that xj> t i G [0, 1] for each i, otherwise the inner expectation is 
zero (except for a few "boundary" values of P which give a net contribution 
of O m {s')). 

If e 1 is small enough depending on e,M, the Om{^') error in the above 
discussion can be absorbed in the 0{e) error, and so we have 
t t 
Ke[N]° II /(^( n )) = IE P IE ne p H F(0(V>i(n))r, a Pti ,x Pji ) + O(e). 
i=i i=i 

We now apply Theorem 11.111 , which tells us the the right-hand side here is 

E P / F P + 0(s) + o nM) ,(1), (7.3) 

_/G*/r* 

where as usual ^ G l is the Leibman group associated to the system of 
forms ^ = {-01, . . . , ^t}, and here Fp : G*/r* — > C is the function 

t 

Fp((gi, 9t)r*) := J| F(giT, a Pti , x Pji ). 

i=l 



Readers may find it helpful to keep the hierarchy of scales 

1 » £ > 1/M, 1/q > e > l/J r (M) > S > 1/JV > 

in mind. 
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The heart of the matter is to obtain an upper bound on the quantity 
J* G * Fp appearing in (I7,3p . To do this, of course, we need to make 
use of the assumption on the forms ipi, . . . ,ipt, as well as the fact that 

The aforementioned assumption, namely that , . . . , 4>l +l are linearly 
independent, implies that is the whole of R* which, in view of the 

definition of the Leibman group G*, implies that G[ s+1) < G*. By Fubini's 
theorem, we thus have 



F FAs 



/ *=/ 

JG*/r* jg' 
where 

t 

,op,i,a;p,i) (7.4) 



and -F<g s is defined by averaging over cosets of the normal subgroup G( s+1 \, 
specifically 

F^ s (s-r,a,x) := / F(gg s+1 T, a, x) dg s+1 . 

JG (s+ i)/r (s+1) 

Since F was Lipschitz with norm Om(1), we see that F^ s is Lipschitz with 
norm Ojf(l) also. Also, since F is bounded in magnitude by 0(1), so is 
F^ s . 

As the forms , ■ ■ ■ , V^t" 1 " 1 are independent, we see in particular that ip\ 
is non-zero. This implies that the projection of G* to the first coordinate 
G is surjective. Meanwhile, from (|7.4p and the boundedness of F^ s we have 
the crude upper bound 

\Fp,4s((gi,---,gt)T)\ < li^sC^iT, a P) i, a?p>,i)|. 
From Fubini's theorem, we obtain the bound 



L 



Fp\ « f \F <s (-,a F)1 ,x F)1 )\. (7-5) 
Jg/t 



'G*/r* JG/r 
To proceed further, we need a crucial smallness estimate on F<^ s : 

Proposition 7.2 (F<^ s small in L 2 ). For any a G Z/qZ and x £ [0, 1], one 

has 

[ |F^(-,a,x)| 2 < 0( e ) + M (e')+ 
iG/r 

05->oo;Af,e,e'(l) + °.F(M)->-oo;M,e,e'(l) + °N-+oo;M,e,e' (1) • 

Proof. By reflection symmetry we may assume that x ^ 1/2. We may also 
round x so that x = qn^/N for some no G [N/2q], as the error in doing so 
can be easily absorbed by the Lipschitz properties of F^ s . 
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By construction, F^ s is invariant on G( s+ i)-cosets, while F — F^ s in- 
tegrates to zero on any such coset. In particular, F<^ s (-,a,x) and F — 
F^ s (-,a,x) are orthogonal, and thus 

/ \F^ s (-,a,x)\ 2 = f FFZ(-,a,x). 
Jg/v Jg/f 

Applying Theorem 11.111 (really just the special case of this result asserting 
that (g(n)T) is equidistributed, cf. Lemma |3.T[) and the Lipschitz nature of 
FF^ S , the right-hand side can be written as 

^ne[e'N] FF <s(g(qn + qn + a)T, a, x) + o F ( Af )_ KX) . Af)e)E /(l) + o iV _^. oo;M)£ie /(l). 

Let P be the progression {qn + qno + a : n 6 [e'iV]}. Then by a further use 
of the Lipschitz properties of F, we can rewrite the above expression as 

E neP F(g(n)T,n mod q,n/N)i/;(n) + M {e') 

+ °T{M)^oo;M,e,e'(X) + OjV->oo;M,e,e'(l) (7.6) 

where 

ip(n) := F< is (g(n)T,a,x). 

Note that, as a consequence of the G( s+1 )-invariance of F<^ s , ip(n) is a degree 
^ s nilsequence of complexity Om(1)- Now by (|7.1|) we have 

F(g(n)T,n mod q,n/N) = f(n) - f un f(n) - f 8m i(n). 

The contribution of f$ m i(n) to (17.6P is 0(e) by the Cauchy-Schwarz inequal- 
ity. Now consider the contribution of /. Observe that because F<^ s is Gr a+ xy 
invariant, ip is a degree ^ s nilsequence of complexity Om(1). Meanwhile, 
[|/[|{/h-x[jv] ^ ^ by hypothesis. Applying the converse to the inverse con- 
jecture for the Gowers norms (first established in [28], though for a simple 
proof see [33], Appendix G]), we see that 

E neP /(n)V>(n) = O 5 _ >0 ;M,e,E'(l)- 
Similarly, since ||/ un f ||jp'+inv] ^ l/F(M) and s' ^ s, we have 

En G p/(n)V>(n) = oj-(MH0;M, e , e ' (In- 
putting all of these estimates together, we obtain the claim. □ 

Applying this bound and (I7.5P , we can thus bound (|7.3p in magnitude by 

0(e) + Om{z') + < S_>cc;Af,e,£r'(l) + O.F(A/)->-oo;M,e,e' (1) + °JV->oo;Ar,e,e'(l)- 

Choosing e' sufficiently small depending on M and e, and choosing F suf- 
ficiently rapidly growing depending on e, and then using the bound M = 
£i j-(l) (and recalling that iV can be chosen large depending on 5), we 
conclude that 

t 

\ E ne[N]°Y[f(Mn))\ <e 

1=1 

whenever 5 is sufficiently small depending on e. Theorem 11.131 follows . 
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Remark. It seems certain that one can extend this result to the case 
when one has t distinct functions fi,...,ft : [N] —> C rather than a single 
function / : [N] — > C. The main change in the argument would be to use 
a version of the regularity lemma (Theorem II. 2\\ valid for several functions 
simultaneously, in which one regularises the fi, - ■ ■ , ft using the same data 
M, q, (G/T,G m ), g() (but allows each function fi to be given a separate 
Lipschitz function Fi : G/T x 7Ljq7L X R —> C). Such a result could be 
obtained by straightforward modifications to the proof of Theorem 11.21 but 
we do not pursue this matter here. 

Appendix A. Properties of polynomial sequences 

In this appendix we collect a variety of facts and definitions concerning 
polynomial sequences in nilpotent groups, all of which were required at some 
point in the paper proper. We take for granted the definition of filtration G, 
and of the group poly(Z rf , G m ) of polynomial sequences g : 7L d — > G adapted 
to G,; these notions were recalled in the introduction. 

Taylor expansions. Polynomial sequences may be described in terms 
of so-called Taylor expansions. In the lemma that follows we make use of the 
generalised binomial coefficients (") are the generalised binomial coefficients 

m\ ( n D 
hj " \iD 



where 



(m, . 


• • ,n D ) 




••,«£)) 


(I) 









l)...(n-* + l) 



t! 

ZD 



If i = (ii, . . . ,id) € N is a L>-tuple of non-negative integers we define the 
degree |i| := i\ + • • • + id- Choose an arbitrary ordering on N D with the 
property that |i| ^ |j| whenever i ^ j. 

Lemma A.l (Taylor expansions). Suppose that g G poly(Z D , G,). Then 
there are unique Taylor coefficients g\ G Gui with the property that 

g(n) = gP 

ieN d 

for all n G iP . Conversely, every Taylor expansion of this type gives rise 
to a polynomial sequence g G poly(Z-°, G m ). 

Remarks. This is proven in [3Q|, Lemma 6.7]. Note that, since G is 
nilpotent, this is a finite expansion. In the case D = 1 (which will feature 
most prominently in the paper) the it takes the form 

9(n)=909l l) ...g K s s) . 
Note how, from the presentation of polynomial sequences as Taylor expan- 
sions, it is by no means clear (and somewhat remarkable) that they form a 
group under pointwise multiplication (Theorem II. 6p . 
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Polynomial sequences that vary slowly, in a certain sense, are called 
smooth. We employ the following definition, which is the same as the one 
given in the introduction to |30j. 

Definition A. 2 (Smooth sequences). Let A be a positive parameter and let 
N ^ 1 be an integer. Let j3 S poly(Z, G.). We say that j3 is (A, N)- smooth 
if we have d G (/3(n), id) ^ A and d G (P(n), (3(n + 1)) ^ A/N for all n £ [N]. 

Here do is a metric on the group G constructed using the Mal'cev basis, 
see |30t Definition 2.2]. The precise definition of this metric is not terribly 
important for our analysis. 

In counter p oiniFl to the notion of a smooth sequence is that of a rational 



Definition A. 3 (Rational sequences). Let A ^ 1 be an integer, and let 
(G/T,G») be a filtered nilmanifold. Then an element g E G is A-rational 
if there is some q, 1 ^ q ^ A, such that g q £ T. If 7 G poly(Z,G») is a 
polynomial sequence then we say that it is A-rational if 7(n) is j4-rational 
for every integer n. 

We have the following basic facts about smooth and rational sequences: 

Lemma A. 4 (Basic facts). Let (G/T,G») be a filtered nilmanifold of com- 
plexity ^ Mq. By a "sequence", we mean an element o/poly(Z, G,). Then: 

(i) The product of two (A,N)-smooth sequences is OM ,A(l)smooth; 

(ii) The product of two A-rational sequences is Om ,a(1) -rational; 
(in) Any A-rational sequence is periodic with period Om ,4(1)- 

Proof. For (i), see |30l Lemma 10.1]; for (ii), see |30|, Lemma A. 11 (v)]; 
and for (iii), see \30\ Lemma A. 12 (ii)]. In fact these results hold in the 
multiparameter setting, with polynomially effective bounds, but we will not 
need these facts here. □ 

We turn now to an important new definition for this paper, that of an 
irrational polynomial sequence. In |30| . much emphasis was placed on the 
notion of an equidistributed polynomial sequence g : Z — > G: one for which 
the orbit (g(n)T) n€ [ N ] is close to equidistributed on G/T. The notion of 
an irrational sequence implies equidistribution (see Lemma 13.71 which is 
also a special case of Theorem II. lip , but also encodes an assertion that the 
filtration G, is in some sense "minimal" for the sequence. To illustrate the 
difference, let us think about a simple abelian case in which G/T is just the 
unit circle R/Z (written additively), and g : 7L — > M. is a polynomial 



One could take an "adelic" perspective here and view smooth sequences as those that 
are local to the Archimedean place 00, while rational sequences are those that are local 
to finite places p. 



sequence. 




(A.l) 
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This sequence is adapted to the filtration in which Gu\ = R for i ^ s and 
= {0} f° r i > s. Qualitatively speaking, g is equidistributed if at least 
one of ai,..., a s is irrational; in contrast, g is irrational with respect to this 
filtration if it is a s which is irrational. Note that if s > 1 and a s is rational, 
then (after removing the periodic component a s n s from g) g is now adapted 
to the filtration G' 9 in which GL-. = R for % s — 1 and G',^ = {0} for 
i > s — 1, which has a strictly smaller total dimension. This basic example 
is the model for the more sophisticated result in Lemma 12.91 

Let us turn now to the precise definition in the more general setting of 
Lie group- valued polynomial sequences, in which the role of the a, is played 
by the Taylor coefficients of g. We need a preliminary definition. 

Definition A. 5 (i-horizontal characters). Let (G/T,G») be a filtered nil- 
manifold of degree ^ s with filtration G, = (G(j))^ . Then by an i- 
horizontal character we mean a continuous homomorphism from £j : G^ — > 
R which vanishes on Gu + u, Try and on [G(j\, Gu_j\\ for any ^ j ^ i. We 
say that such a character is non-trivial if it is not constant. We can assign 
a notion of complexity by taking a Mal'cev basis adapted to G 9 , where- 
upon one has a natural isomorphism Gu\/Gu + i-\ = R fe . Writing ip(9i) f° r 
the coordinates of <?j(mod any i-horizontal character has the form 

Ci(ffi) = '^ l -i ) {9i)i fo r some vector m = (mi, . . . , rrik) of integers. We may 
then define the complexity of £j to be \m\\ + • • • + \rrik\- 

The list of subgroups on which £j is required to vanish looks rather re- 
strictive and slightly unnatural at first sight. Roughly speaking, this list is 
intended to isolate that behaviour which genuinely "belongs" to the degree 
i portion of the filtered nilmanifold, as opposed to arising from those terms 
of higher or lower degree, or which disappear after quotienting out by the 
lattice r. 

Definition A. 6 (Irrationality). Let (G/T, G,) be a filtered nilmanifold of 
degree ^ s with filtration G, = (G^)°Z . Let gi £ G^y Let A,N > 0. Then 
we say that gi is (A, N) -irrational in G^ if for every non-trivial i-horizontal 
character £j : Gn\ — > R of complexity ^ A one has ||£i(<7i)||]R/z ^ A/N l . We 
say that the sequence g(n) is (A, N) -irrational if its i th Taylor coefficient g\ 
is (A, iV)-irrational in G^ for each i, 1 ^ % ^ s. 

To understand this definition, it is helpful to consider examples. We leave 
it as an exercise to check that in the abelian case (jA.ip this amounts to 
stipulating that the top coefficient of g is poorly approximated by rationals, 
thus ||gcK s ||iR/z ^ A'/N s whenever 1 ^ q < A'. 

A second interesting case to examine is that in which g(n) = g n is a linear 
polynomial sequence adapted to the lower central series filtration (Gj)°^ . 
For the lower central series filtration there are no nontrivial z-horizontal 
characters when i ^ 2, and 1-horizontal characters are the same thing as 
horizontal characters in the sense of [301 Definition 1.5]. It follows from this 
and [30l Theorem 1.16] that g(n) is irrational if and only if (g(n)T) n€ ^j is 
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equidistributed. Now polynomial sequences that are not linear do not arise 
naturally in ergodic-theoretic settings such as those considered in [UHT], and 
thus the equivalence of the notions of "irrational" and "equidistributed" 
in this setting explains why the former concept has not appeared in the 
literature before. The need for it is a new feature of the quantitative world, 
as is the need for polynomial nilsequences themselves, for reasons explained 
on pU §1]. 

The following third example is also edifying. Take g(n) to be any poly- 

/ 1 an -fn 2 \ 

normal sequence on the Heisenberg group, for example g[n) = I o l pn ) ■ 

Vo o i J 

This sequence is a polynomial sequence adapted to the lower central series fil- 
tration Gq = G\ = G, G*2 = [G, G], G3 = {id}, and it will be equidistributed 
in that setting for generic a,/3,j. However g is also a polynomial sequence 
with respect to some much flabbier nitrations, for example the one in which 
G(o) = G(i) = G(2) = • • • = G( 10 ) = G, = • • • = G( 10 o) = [G, G] 

and = {id} for i ^ 101. It is easy to check that g is not irrational in 
this setting, and indeed irrationality is somehow detecting the fact that a 
given filtration G, is minimal for g. This point is quite clear in the proof 
of Lemma 12.91 (which itself depends on Lemma lA. 71 below), where the fail- 
ure of a sequence to be irrational is used to create a coarser filtration for a 
polynomial sequence related to g. 

Lemma A. 7. Suppose that (G/T,G 9 ) is a filtered nilmanifold of degree ^ s 
with filtration G. = (Gr(j))^ . Suppose that g is not (A, N) -irrational. Then 
there is an index i, 1 ^ % ^ s, such that the i th Taylor coefficient gi factors 
as Pig'i'ji, where Pijg^Ji G g\ lies in the kernel of some i-horizontal 

character : Gu\ — > R of complexity at most A, dc(f3i, id) = Oa,m(N~ 1 ) 
and j{ is Oa,m(1) -rational. 

Proof. The proof is (unsurprisingly) extremely similar to that of [30|, Lemma 
7.9]. Reversing the definition of irrational polynomial sequence, we see that 
there is an index i together with an i-horizontal character £j : G^ —¥ R 
such that ||£i(ffi)||K/z ^ A/N l . It is convenient at this point to work in 
a Mal'cev coordinate system adapted to G 9 , whereby Gu\/Gu + \\ may be 
identified with R fc and ^u)/Gf i+ i\ with Z k . If gi G Gu\ then, as above, 
we write ip{g) G R fc for the corresponding coordinates. Then £j has the 
form £,i(gi) = rh.ip{g) for some vector rh = (mi, . . . , m&) of integers with 
|^i| + • • • + \mk\ ^ A. Now by assumption we have IIw-VK^OIIr/z ^ A/N l , 
and therefore rh.'ip(gi) = r + 0(A/N l ) for some integer r. It follows from 
simple linear algebra that we may write tp(gi) = t + u + v, where fh.u = 0, 
the coordinates of v lie in for some Q = Oyi(l) and each coordinate 

of t is Oa0-/N 1 ). Now choose G Gu\ in such a way that f/KA) = t and 
^G(A,id) = Oa,m(^/^i), choose an O^M^-rational element 7^ G Gu\ with 
■0(7i) = ^> an d finally choose g[ so that gi = Pig'^. Then one automatically 
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has ip(g'j) = u, which means that g[ lies in the kernel of the i-homomorphism 

" □ 

Finally, we record a convenient scaling lemma. 

Lemma A. 8 (Scaling lemma). Let (G/T,G,) be a filtered nilmanifold of 
complexity ^ M. If g G poly(Z,G.) is (A, N) -irrational, r G [— N, N], and 
1 ^ q ^ M , then the sequence n i— > g{nq + r) is (3>M,e A, eN)-irrational for 
any e > 0. 

Proof. We need to show that the i th Taylor coefficient of n \— > g(nq + r) is 
(^M.e A, eiV)-irrational for each i ^ 0. Note that we may assume i ^ M 
since the filtered manifold has degree ^ M. 

Fix i. We may quotient out the nilmanifold by the normal subgroups 
and [G(j\, Gu_j\] for ^ j ^ i, since these do not affect the irra- 
tionality of the i th coefficient. We may then expand g as a Taylor series 



g(n) = l[g 3 
i=o 

and thus 



g(qn + r) = Y\g 



+r\ 
j I 

J 

j=0 



y ? 



Expanding out the binomial coefficient and using many applications of the 
Baker-Campbell-Hausdorff formula, we obtain 

i-l 

g(qn + r) = (JJ(fl£)^)0i 

3=0 

for some g'j G G^ ; the point being that the Baker-Campbell-Hausdorff term 
cannot generate any terms involving polynomials in n of degree i or higher 
due to the fact that the groups and [Gu\, Gu_j\] have been quotiented 

out. As a consequence, we see that the i th Taylor coefficient of n \— > g(qn-\-r) 
is q l g%, and the claim is easily verified. □ 

Appendix B. A multiparameter equidistribution result 

The purpose of this appendix is to prove Theorem 13.61 which we recall 
here again. 

Theorem 13.61 Suppose that (G/T,G.) is a filtered nilmanifold of com- 
plexity ^ M and that g G poly(Z' D ,G.) is a polynomial sequence for some 
D ^ M. Suppose that A C iP is a lattice of index ^ M , that no G iP has 
magnitude ^ M, and that P C [— N, N] D is a convex body. Suppose that 
5 > 0, and that 



ne(n +A)nP 1 J 



G/r 
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for some Lipschitz function F : G/Y — > C. Then there is a nontrivial 
homomorphism r/ : G — > R which vanishes on Y, has complexity Ojtf(l) and 
such that 

\\v °g\\c™([N] D ) = Os,m(y)- 

Recall from [30, Definition 8.2] that the norm ||fl , ||c° o ([;v] D ) °f a polynomial 
sequence g : [N] D — > R is given by the formula 

lbllc°°([Jv] D ) = SU P N ~ llffilk/z 

iGN D 

where g\ are the Taylor coefficients of g, thus 

*»> = £ (?>■■ 

We now prove the theorem, allowing all implied constants to depend on 5 
and M. We may assume that iV is sufficiently large depending on 5, M, since 
the claim is trivial otherwise. A simple volume packing argument (using |314 
Corollary A. 2], for example, to control the boundary terms) shows that 

I (no + A) n P\ = 0^- + o^oo(iV^). 

As a consequence, for N large enough we may subtract off the mean of F 
and normalise F to have Lipschitz norm 1 and mean zero, thus 

| Y, F (9(n)T)\ »iV D . 

n€(n +A)nP 

As A has index ^ M in 7j D , it contains the sublattice qlP for some positive 
integer q = 0(1). By the pigeonhole principle, we may thus find ni € 7lP 
of magnitude O(l) such that 

| ]T F(g(n)Y)\^N D , 

nG(ni+5Z D )nP 

and thus 

| ^(^n + n^r)! > N D . 

n& D r\P' 

for some convex body P' contains in a ball of radius O(N) centered at the 
origin. 

By subdividing P' into cubes of sidelength sN for some sufficiently small 
e > (and again using [3U Corollary A. 2] to control the boundary terms), 
and then applying the pigeonhole principle, we see that 

| Y F(g(qn + n 1 )Y)\ > N D 

n&Z D nn 2 +[eN] D 

for some £> 1 and ri2 = O(N). We can rearrange this as 
| Yl F(g(qn + n 3 )Y)\ » iV D 

n£Z D n{eN] D 
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for some 113 = O(N). 

We may now invoke |301 Theorem 8.6] to conclude that there exists a 
nontrivial homomorphism rj : G — > R which vanishes on T, has complexity 
0(1) and such that 

||r/o 5 (g-+n 3 )|| oc,([ JV] r. ) < 1. 

Applying [30J Lemma 8.4] we conclude that 

||Q77 £(- + n 3 )|| C cx> ([i vp) <C 1 

for some non-negative integer Q = 0(1). Shifting the Taylor expansion by 
113, we conclude that 

\\QV 9\\c°°([N] D ) < L 
The claim follows (with 77 replaced by Qrf). 

Appendix C. The Baker-Campbell-Hausdorff formula 

Let G be a connected, simply connected nilpotent Lie group, and let exp : 
q — > G and log : G — > q be the associated exponential and logarithm maps 
between G and its Lie algebra q. The Baker-Campbell-Hausdorff formula 
asserts that 

exp(Xi) exp(X 2 ) = exp(X! + X 2 + ^[X 1 ,X 2 ] + JJ c a X a ) 

~ a 

for any Xi,X 2 , where a is a finite set of labels, c a are real constants, and 
X a are an iterated Lie bracket of k\ = k\^ a copies of X\ and k 2 = k 2)0 copies 
of X 2 where k±,k 2 ^ 1 and k\ + k 2 ^ 2. 

Using this formula, it is a routine matter to see that for any g\ , g 2 € G 
and x £ R, we have 

(9192T = g x l9 x 2 J{g^ x) (C.l) 

a 

where a is a finite set of labels, each g a is an iterated of k\ = k\ M copies 
of g\ and k 2 = k 2<a copies of g 2 where k\,k 2 ^ 1 and ki + k 2 ^ 2, and the 
Q a : R — > R are polynomials of degree at most /ci + k 2 with no constant 
term. 

In a similar vein, for any g±,g 2 G G and x\, x 2 G R, we have the formula 

b?\32 2 ] = [9i,92r X2 U^ aiXuX2) (C-2) 

where a is a finite set of labels, each g a is an iterated commutator of k\ = k\ )0l 
copies of g± and k 2 = k 2;a copies of g 2 where k\, k 2 ^ 1 and ki + k 2 ^ 3, and 
the P a : R x R — > R are polynomials of degree at most k\ in x\ and at most 
k 2 in X2 which vanish when x\ = or x 2 = 0. 
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